The Journal of 
Experimental Education 


A periodical report of scientific investigations relating to child development, 


Vol. XXVII 3 December 1958 No. 2 


CONTENTS 
PAGE 
A Re-Examination of Personality Structure in Late Childhood, and Devel- 
opment of the High School Personality Questionnaire 
Raymond B. Cattell, Richard W. Coan, Halla Beloff 73 


The Behavior of Teachers and the Productive Behavior of Their Pupils: 

I. “Perception” Analysis Morris L. Cogan 89 
The Behavior of Teachers and the Productive Behavior of Their Pupils: 

Il. “Trait” Analysis Morris L. Cogan 107 


The Use of Free Response Data in Writing Choice-Type Items 
Desmond L. Cook 125 


Analysis of Cornell Orientation Inventory Items on - Study Habits and 
Their Relative Value in Prediction of College Achievement 
Parviz Chahbazi 135 


The Relationship of Peer Group Rating to Certain Individual Perceptions 
of Personality ]. W. Ramey 143 


$7.50 A YEAR PUBLISHED QUARTERLY $1.90 A COPY 


Published by Dembar Publications, Inc., 
Madison 3, Wisconsin. 
Second class postage paid at Madison, Wisconsin 


| 


EDITORIAL BOARD 


A. 8. Barr, Chairman, Professor of Education, University of Wisconsin, Madison 6, Wis. 


palin, 


the City ot New York, 
Board of of the City of New York, 
sae. See York, 110 Livingston B 

New 

riculum 


Editorially res e for on cur- 
construction, each 


i 
June. 


CONTRIBUTING EDITORS 


J, Protesscs, of Rducation, University of 


D. Conus Education, Uni- 


College of Woshington, Pullman, 
Beard, Princesa, New 
Columbia’ University, New York, Wew York. 
for Teachers; Neshville, Tennessee. 
Jobe of Psychology, University of 
ren 
‘Es Professor of Educational 


of Educational 
University of 


child Wellars, University’ of California, 


Berkeley 4, California. 

ag od A. Lincoln, Consulting Psychologist, Halifax, 


° 


Vaiversity of ¢ » Duluth, 


D. ot Bomation Directes 
T, Rock, Js. Professor of Peychology, Head of 
“Ber 
Lote, Seaside, of Comnesting 
Meal Corporation, New York 18, New Your 
Paul W. Terry, Professor of 
University Alabama, University, 
College, Columbia University, New York 
of Paychelegy, Obie State 
University, ‘Tokyo, Jepan. 


M. Walker, Professor of : 
Columbia University, New York 


Beth L. Wellman, Professor of , Child Welfare 
Research Station, State U; Iowa City, 


Peal A. Witty, of Director of Paycho- 
Educe Clinic, School of Education, 
University, Evanston, Illinois. 


Ernest R. W Professor of Education, 
versity, New York City. a 


and Measurement, versity of Nebraska, 


Psychology, 


Boston 
Massa- 


: oO. Director, Research Services, H. H. Remmers, Professor Educational Psychology, 
torially responsib! materials versity : Indiana. orially responsible mate- 
research. rile ol teaching and supervision, published 
Arthur 
development, published § December. 
methods 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, December 1958) 


A RE-EXAMINATION OF PERSONALITY STRUCTURE 
IN LATE CHILDHOOD, AND DEVELOPMENT OF THE 
HIGH SCHOOL PERSONALITY QUESTIONNAIRE 


RAYMOND B. CATTELL 
RICHARD W. COAN 
HALLA BELOFF 
University of Illinois 


1. The Setting of the Problem in Personality 
Structure Research 


UNTIL VERY recently in the history of fac- 
tored personality measurement, there has been 
only one factored questionnaire instrument for 
use at a child level. This instrument, first 
named the Junior Personality Quiz (JPQ), was 
derived from research which has been reported 
by Cattell and Gruen (15), and Cattell and Beloff 
(12). Psychologically, the JPQ factors general - 
ly bear a close relationship to factors which have 
been found also at the adult level, and this rela- 
tion has been proved by correlation (12). If such 
an instrument is to become widely used, either 
for practical or research purposes, there is, 
however, a need, scientifically, for re-examina- 
tion of the structure of its component factors, up- 
on an independent sample (for experience shows 
some fluctuation in simple structure). There is, 
also, a practical need for the extension of the 
scale into an equivalent B form, to give the in- 
creased reliabilities possible from longer testing. 

Immediate practical aims were, therefore, 
(1) checking the factor structure, particularly the 
rotational resolution, (2) intensifying the scales, 
(3) extending the scales, and (4) checking the 
identification of the factors against those estab- 
lished at later ages, and possibly adding one or 
two additional factor scales that might prove im- 
portant. However, for the benefit of the reader 
unfamiliar with the several, previous, related 
publications (5,9,14,15,17), it shouldbe pointed 
out that this research on personality structure in 
questionnaire responses in the age range 12 
through 17 years, is also a planned part of a 
more general basic research program conc ern- 
ing personality and motivation structure. This 
broader program has had as its objectives the de- 
termination of personality structure, by factor 
analytic and related methods, in a coordinated 
attack, (1) over the three possible media of be- 
havioral observation, namely, L-data, or life 


behavior, in situ; Q-data, or response to ques- 
tionnaires, from introspective self-evaluation; 
and T-data, or objective non-self-evaluative, 
test response behavior, and (2) over the develop- 
mental age range, by cross-sectional structur- 
ings at the adult level, at 14 years (as here) at 10 
years, at Tyears, andat4yearsof age. The 
general coordination of L-, Q- and T-data find- 
ings is discussed elsewhere (17), and this account 
will digress from the questionnaire findings in 
the 12-16 year range only to the poin. of referring 
to their integration with Q-data findings at neigh- 
boring ages. 


2. Design of the Experiment 


The present design called for the invention of 
an extensive new pool of questions, out of which a 
B form of the high-school questionnaire could 
emerge. It was proposed to evaluate these new 
items while re-determining the factor structure 
of the original questionnaire. The new question- 
naire emerging from this work will be called the 
HSPQ or High School Personality Questionnaire. 
The original JPQ questionnaire takes its author- 
ity from a factor analysis of 295 items (103 di- 
rectly factored) on 333 eleven- and twelve-year- 
old boys and girls, by Cattell and Gruen (15), and 
a subsequent questionnaire construction analysis 
by Cattell and Beloff (12), which first positively 
identified the childhood factors in terms of adult 
16 PF factors (18). 

The resulting JPQ test has been a valuable in- 
terim instrument, permitting such research on 
basic personality factors as the determination of 
their nature-nurture ratios (13), and their predic- 
tive power in regard to school achievement when 
abilities are held constant (9,23). Although this 
work has confirmed the psychological meaning 
given to the factors, and shown that they behave 
with the expected independence, it has not allayed 
the suspicion that questionnaires with children 
need greater length than with adults to achieve 


74 ‘ 


satisfactory reliabilities. Therefore, further 
work was indicated, both to extend and intensify 
scales, and to check the factor identifications. 

Design of the new items was carried out joint- 
ly by Cattell and Coan in America and Bel off in 
Britain, with the aim of obtaining sets suitable 
for international scales. Guided by the aim of un- 
iformly covering the personality sphere (9), an 
initial total of 450 items was constructed. This 
size of the initial pool was planned in anticipation 
of loss by attrition largely at four points: (1) pos- 
sible rejection for psychological unsuitability, 
when examined by judges using our factor con- 
cepts and word familiarity standards, (2) unde- 
sirable extremity of yes-no cut, beyond 90%-10%, 
(3) absence of sufficient factor loading, (4) mal- 
distribution of items among factors, or wrong di- 
rections of response for particular factors. It 
was desired to have 260 survivors, 130 for each 
form. Three hundred and sixty-five survived the 
first examination for psychological suitability and 
these were administered to 200 Belfast children 
ranging in age from nine to sixteen years, the 
lower age being taken to ensure readability for 
such backward children as might be found among 
the twelve-year-olds, i.e., at the lower limit of 
the 12-17 year range for which the test was in- 
tended. 

Phi coefficients were worked out between these 
items and the original JPQ factors, ina 11 x 365 
table. By taking only those newly invented items 
which had a more central yes-no cut than 90%- 
10%, and which proved to have adequate correla- 
tion with one or more of the 11 dimensions of the 
existing JPQ, the total was reduced to 251 (plus 
132, constituting the non-intelligence items to be 
added from the existing JPQ. (12) ). Parentheti- 
cally we may note that this procedure makes no 
effort to represent the five or six less defined 
factors in children’s personality questionnaire 
responses which are at present known to exist— 
through the work of Cattell and Gruen (15) and 
others, as well as from the present factoring (9, 
21)—though with lesser variance than that of the 
accepted factors. However, at the present stage 
of research it seems desirable to make really 
sure of the measurement of the factors of largest 
variance. Actually our procedure allowed for in- 
clusion of any new factors of outstanding potency 
and in the end two new factors were added to the 
scale. 

Having thus obtained a pool of 251 new items, 
effective both as to high involvement in personal - 
ity and freedom from eccentricity ofcut, we pro- 
ceeded to the factor analysis. This followed the 
design of packaged or parcelled factor analysis 
which we have proposed and tested elsewhere (7, 
8). This is one of the two major methods (the 
other being matrix meshing) available for obtain- 


ing factor loadings for a larger number of vari- 
ables than can be factoreddirectly in a single ma- 


JOURNAL OF EXPERIMENTAL EDUCATION 


trix. Parcelled factor analysis is indicated as 
the preferred technique when sound prior knowl- 
edge exists about the nature of the general factor 
structure and the cluster affiliations of most 
items. 

In this case we were confident of the genres 
factor structure of the JPQ and in any case plann 
to represent it in the new factorization by carry- 
ing two markers from each factor. The two 
markers were constituted by putting the 12 items 
for each factor into two short (6 item) equivalent 
scales. Twenty-four such markers for the twelve 
factors (intelligence was not included) were to be 
the chief landmarks for structure, and they con- 
stituted the first twenty-four variables inthe new 
matrix. We also knew the approximate factor cor- 
relations for the 251 items, from the rectangle 
matrix above, and, on this basis, these items 
were placed in 46 homogeneous ‘‘parcels’’ aver- 
aging six items per parcel. These approximate- 
ly homogeneous, but certainly not necessarily 
factor-pure, ‘‘parcel’’ scales were similarly 
scored as unit variables, bringing the total of 
parcel variables for factoring to 46 + 24 = 70. It 
will be observed that this design does not preju- 
dice the discovery of factor structure afresh in 
this analysis, for the markers are swamped by 
new variables which might give acwfactors, but 
it does provide a real and independent check on 
the old rotation. 

As usual in parcelled factor analysis, it was 
planned to determine the factor structure in 
terms of these 70 parcels; toestimate the individ- 
ual’s factor scores from the parcel scores; and 
then to undo the parcels and correlate the individ- 
ual constituent items with these factor scores. 
Thus approximate loadings of the items in the fac- 
tors could be obtained without impracticably vast 
labor of directly factoring a 383 x 383 correla- 
tion matrix (383 = 251 new plus 132 JPQ items). 
The particulars of these calculations are given 
in the following section. i 


3. Confirmation of the JPQ Factor Structure 


The battery of 383 items was administered in 
two sections, one day apart, to 168 12- and 13- 
year-old boys and girls in the schools of Terre 
Haute, Indiana. Score distributions on each of 
the seventy short scales were dichotomized at the 
center and a 70 by 70 ‘‘parcel’’ correlation ma- 
trix calculated by means of the phi coefficient. 
Fifteen factors were extracted by the centroid 
method, and since the usual group of criteria (6) 
on completeness of extraction suggested twelve 
or thirteen factors, thirteen were actually pre- 
served for rotation. 

Rotation began by the trial vector method (24), 
the markers from the JPQ giving rough initial di- 
rection to the first eleven vectors and the remain- 
der being placed for high communality and as 


CATTELL-COAN-BELOFF 75 


near orthogonality as possible. However, this 
was merely a starting point, for no fewer than 
25 overall rotations, guided by visual plots, 
were then made toward maximized and statis- 
cally significant oblique simple structure. Rota- 
tion ceased when the percentage of variables in 
the hyperplanes of the whole c onfiguration had 
climbed, at the 20th rotation, to a plateau at 65%, 
and when five further rotations failed to produce 
any increase. Thisis far beyondthe P = .01 level 
of significance of simple structure by Bargmann’s 
test (2), and from the visual plots alone it is 
clear that there is a very good structure. The 
fact that the absolute proportion in the hyper- 
plane does not reach the 68 or 70% level we have 
achieved in the 16 PF questionnaire (8), can prob- 
ably be ascribed, in view of later findings, to 
the lesser reliability of individual questionnaire 
items responses with children, blurring the hy- 
perplane from +.10 to a somewhat broader band. 

The unrotated matrix (Vo), simple structure 
rotated matrix (Vp), and transformation matrix 
(Xp) are preserved at the American Documenta- 
tion Institute, where microfilm copies may be 
obtained by ordering number 72459, while theco- 
sine matrix is shown in Table IV (a). 

On examining the simple structure rotated 
matrix, Vp, from which Table I is extracted, it 
was found that the two markers for each of the 
original twelve JPQ factors had faithfully ap- 
peared at the head of the factors in this new, in- 
dependent simple structure rotation. There was 
some misplacement in the case of old factors 2, 
7, and 8 as shown in Table I, for in these three 
one marker had appeared correctly and one had 
gone astray into another factor. Except for the 
three off-diagonal values shown in the table, all 
other off-diagonal loadings were negligible. As 
will be seen, the old markers always retain their 
characters and remain the highest values in the 
row and column for each factor. These old sali- 
ents were, of course, joined and exceeded by new 
salients when the new items from the remaining 
46 parcels were similarly correlated (after ‘‘un- 
doing’’) with the factor estimates, as described 
more precisely in the next section. 


4 Obtaining the Correlations (Concept Valid- 
ities) of Items with Factors 


At this point we had confirmed, to the degree 
indicated, that the eleven dimensions of person- 
ality found to span the childhood personality 
sphere, in the original JPQ research (12,15), 
were correct in themselves and still representa- 
tive, except for two new factors, even among a 
new and wider range of response items. There 
would seem to be little doubt, as far as question- 
naire response material is concerned, that these 
thirteen factors represent the most stable and 
comprehensive functional unities in personality 


structure at the high-school age. 

Now it remained to determine the validity of 
every one of the 383 usable items against these 
thirteen uniquely rotated factors. This required 
that we estimate the scores of each of the 168 chil- 
dren on each of the thirteen factors and then cor- 
relate all individual test items with each and 
every factor. It will be recalled that each of the 
short scales or ‘‘parcels’’, in which the deter- 
mined factor structure was expressed, is some 
six items in length. Since the salient parcels in 
any factor were, roughly, of the same order of 
loading, and since the items in the parcels were, 
roughly, homogeneous, the factor estimate was 
made by giving one point for each item and thus, 
generally, a 0 to 36 score to each factor. (It has 
been shown (1) that the extra trouble of fine weight- 
ing in these situations is unjustifiable.) 

It will be recalled that our aim was both to ‘‘in- 
tensify’’ the existing JPQ scales (recasting the 
JPQ as Form A of the newtest, called the HSPQ) 
and to develop a second equivalent B form of the 
HSPQ. The term ‘‘intensification’’ has been spe- 
cialized in factor scale research (7,8) to mean 
the replacement of lower loaded items—relative 
deadwood—by new items of higher loading, togeth- 
er with more exact determination of the rotation 
position of each factor itself. The latter is im- 
portant; for simply to add items correlating more 
with the factor as originally defined in, and by, 
the scale would be gaining homogeneity while per- 
petuating with greater thoroughness the systemat- 
ic error of the original factor location (9)! Inten- 
sification of a scale should thus include renova- 
tion not only of items but also of exactness of fac- 
tor determination, as made possible in later re- 
search by higher loadings and more hyperplane 
stuff (6). As to item replacement, the replacing 
item should naturally have a loading exceeding 
that of the replaced by more than the standard er- 
ror of the difference. 

Parcelled factor analysis (7,8) sometimes 
stops at undoing parcels to correlate items with 
the single factor in which the parcels are loaded. 
In the present experiment we did not follow this 
shortest of designs, but correlated items with all 
other factors too. This is to be preferred, if 
time permits, because the initial assignment of 
items to parcels is only on the basis of rough evi- 
dence of homogeneity and there is certainly a pos- 
sibility that some single item might load another 
factor more highly than that to which its parcel 
as a whole belongs. Moreover, knowledge of the 
loadings of all items on all factors is necessary 
at the stage of final assignment of items to factor 
scales, when one wishes to prepare buffered 
scales (16) by arranging mutual ‘supressor’ 
action (16) among factors alien to the factor 
scale concerned. Accordingly, these relations 
were calculated, in a 13 x 383 correlation ma- 
trix. This‘is a large matrix, but still less than 


< 
a 
: 
fl 
° 
< 
2 


ay} uo stsayjuered ut UMOYS papudazUT yey} UO YOTYM 


q 
( (-)a 10) 


q 


I 


ay} 


FTUNLONULS YOLOVA MAN FHL NO CIUVAddV SV ‘O'd‘f SUIVd 


I 


i 
76 
= 
No 
tN 
a NO 
+ + + : 
— 
On 
+ + 
35 
~" 
+ + 
re} 
+ + 
+ + 
aN 
N 
: + + 
$3 
~" ~" ~" 
° < Q > 
. 


CATTELL-COAN-BELOFF 717 


a thirtieth of that required for the ordinary de- 
sign of factor analysis of the same data—and 
which would be too large to fit any known elec- 
tronic computing program. The saving by par- 
celled factor analysis is achieved, however, by 
incurring the degree of inaccuracy involved in 
getting item correlations with an estimated fac- 
tor (of about 0.8 reliability) instead of determin- 
ing loadings with the 100 percent accurate ‘‘pure’’ 
factor vector as in typical factorization. 

As stated, the 383 total was necessary be- 
cause of the plan to bring both the 132 items of 
the existing JPQ and the 251 new items into the 
same matrix. For, of course, it is necessary 
to know which of the existing items in the old 
scales may be of sufficiently low correlation to 
be replaced, and for this purpose it would not be 
safe to go back to their correlations in the orig- 
inal researches (11,14) for these would not have 
strict sampling comparability with the potential 
replacing items in the present sample nor would 
correlations with true and estimated factors be 
comparable. Actually, the 13 x 383 matrix 
shows a remarkably good confirmation of the ap- 
propriateness of assignment of the original JPQ 
items. 

Thus at this point we can conclude: (a) that 
the original factor structure was essentially 
sound; (b) that nine or ten out out of twelve items 
in each old scale continue in the new sample 
and rotation, to load the factor to which they were 
originally assigned with the right sign and signif- 
icant loading. Consequently, the intensification 
described in the next section had the task of re- 
placing only two or three items ineach factor by 
more substantially factor-correlated items. * 


5. Construction of the High-School Personality 
Questionnaire: Obtained Reliabilities 


As stated initially, a residue of sufficiently 
loaded items (other than intelligence factor 
items) was the target of the above analysis, giv- 
ing ten items for each of the 13 personality fac- 
tors for each of the two (A and B) equivalent 
forms. Starting as we did with 450 items (383 at 
factorization), we found the above attrition pro- 
cesses in factoring finally gave us just a narrow 
margin of sufficiently loaded (0.20 or higher) 
items beyond this necessary total. The construc- 
tion of the questionnaire from this point, there- 
fore, followed the usual canons for a good multi- 
ple factor scale test, as follows: 


1. Each factor scale was built items most 
highly loaded on that factor, always having, at 
any rate, significant loading, well clear of the 
hyperplane widths. 


2. Suppressor action was introduced. Princi- 
pally this meant, where good items also had sig- 
nificant loadings on other factors, an attempt 
never to have more than one item onthe required 
factor that loaded (in the same direction) on the 
same irrelevant factor. Whenever possible a load- 
ing on an irrelevant factor Y, introducedinto Fac- 
tor Scale X in the use of an item ‘‘a’’, is sup- 
pressed by finding an item ‘‘b’’, also loaded on 
factor X as is item ‘‘a’’, but loaded additionally 
on Y, with opposite sign on Y to item ‘‘a’’ (16). 
See diagram for an illustration. 

3. Each factor was planned to be scored on 
items completely independent of those scored for 
any other factor, to avoid spurious cor relations 
among factor scores due to sharing specifics and 
errors. 

4. Position and ‘‘yes tendency’’ sets were elim- 
inated. As the work of Berg (2), Cronbach (20), 
and the present writers (9:245) have shown, the 
tendency to agree rather than disagree is (a) sys- 
tematic and (b) personality correlated. Conse- 
quently for each ten item factor scale ineach form, 
five items were chosen to contribute positively 
with an ‘‘a’’ or ‘‘yes’’ answer, andfive witha‘‘b’”’ 
or ‘‘no’’ answer. 

5. The total pool of items of satisfactory valid- 
i. for each factor was sorted into ten items for 
the A and ten for the B form, in a way to give 
equal total validity (equivalent ‘‘wanted factor’’ 
saturation) to these equivalent forms. 

6. A fourteenth scale—the intelligence dimen- 
sion—was added to the thirteen personality factors 
by taking the most ‘‘g’’ saturated and equal diffi- 
culty-spaced items from eachof four sub-tests of 
the senior author’s Intelligence Scales (4) for chil- 
dren of this age. 

7. The 140 items (130 plus 10 intelligence 
items) per form were finally arranged as to fac- 
tors in a modified cyclical order, to separate 
items pertaining to one factor and to give maxi- 
mum convenience in stencil scoring of a machine 
answer sheet. The positions of the factor items, 
and the positively scoring alternatives for each 
factor were also arranged to be identical on the 
A and B forms. 

These canons of good factor scale construction 
contain no reference to some of the procedures 
currently popular among educational psycholo- 
gists and sociologists, for the sufficient reason 
that such scaling habits apply to “‘itemetrics”’ 
rather than ‘‘factormetrics’’, as defined else- 
where (16). Mere inspection may suffice to define 
‘‘content’’ in education and sociology, and homog- 
enizing techniques can do the rest; but this is not 
true of the complex field of personality. Walker- 
Guttman scaling (22,25), for example, does not 
produce, except by chance, a simple-structure, 


* After deletions for other causes, 103 of the original JPQ items have carried over into the HSPQ, 


Form A (11). 


78 


factor-pure scale. Again, high homogeneity of 
items, as by Cronbach’s alpha coefficient (19) is 
obviously no guarantee of factor purity. Indeed, 
in multiple dimensional material, afair number 

of items designed by good factor loading and sup- 
pressor action to be most valid for a unifactor 

scale are likely to show negligible intercor re- 


lation (see ref., 9:162,359; also Diagraml). 


We have avoided any severe screening for ec- 
centricity (difficulty) since obviously if all items 
were at a 50-50 cut there would only be two 
scores possible—all passed or all failed—on 
each scale. On the other hand, as Table VIII 
shows, our selection by an artistic judgment 
rather than a rigid ‘‘minimizing’’ rule for item 
eccentricity has achieved a central tendency in 
standardization falling close to the numer ical 
center of all the scales, while the scale stand- 
ard deviations are such as to use the full range 
—giving sufficient dispersionina normal group 
while allowing some ‘ceiling distance’ for ab- 
normal cases. 

The reliabilities and validities from the above 
scale construction have finally been examined 
experimentally under the following definitions: 


1. Validity is concept validity: the goodness 
of correlation of the scale with the pure factor 
measure, e.g., with the personality concept of 
ego strength, surgency, schizothymia, etc. 
This definition of validity includes (as ‘‘indirect 
validity’’) an examination of the scale’s correla- 
tions with other concepts to see how far they 
agree with the expected correlations of these 
concepts with other constructs. For example, 
if ego strength is known to correlate -0.4 with 
anxiety, our presumed scale of ego strength 
should do the same. These two kinds of corre- 
lational checks we shall call, respectively, the 
direct and indirect validation, in the realm of 
concept (construct) validity. 

2. Reliability is susceptible to no fewer than 
ten different conceptualizations, each represent- 
ed by a specific coefficient, as shown elsewhere 
(9:352); but the three of greatest importance for 
most scale users are the consistency, stability 
and equivalence coefficients. Thefirst isexam- 
ined here as an averaged-split consistency coef- 
ficient from taking the mean of split-half consis- 
tencies obtained from three random, but mutu- 
ally independent cuts. The second is a test-re- 
test agreement, after a lapse of one month, to 
allow function fluctuations both to occur and to 
be assessed. And the last is the usual correla- 
tion of form A and form B versions of each scale. 
These are set out in Table II. 


2. Circumstantial is probably the better term from its analogy to ‘‘circumstantial eviaence’’, for the 


JOURNAL OF EXPERIMENTAL EDUCATION 


The equivalences, and still more the consis- 
tencies, run somewhat lower than those to which 
test users are accustomed on non-factor scales, 
but the meaning of this is best discussed in the 
following section on validity. For it will be noted 
that reliability in the sense of the dependability 
coefficient (9:352), i.e., freeaom experimental 
error of testing, is as good as is typically 
achieved by scales of this length (10 items). Al- 
lowing for the fact that function fluctuation on 
some personality factors, e.g., surgency-desur- 
gency, is known to be appreciable, the stability 
coefficients suggest that actual test dependability 
is high, and the dependability coefficients (im m e- 
diate test-retest, gathered on only a few cases 
and therefore not shown above) are in fact high, 
though systematically not as goodas for adults on 
10 item scales. The marked parallelism between 
the consistency and equivalence coefficients, 
along with the high dependability, points to the 
lowered consistency and equivalence values need- 
ing to be understood in terms of factor validity 
differing from the usual ‘‘homogeneity val idity’’ 
as discussed below. 


6. Validities: Direct and Indirect 


It has been stated that high total validity con- 
sists in the scale correlating with other psycho- 
logical dimensions in the manner in which the 
pure factor in question is known to do. These two 
aspects we have called direct and indirect or cir- 
cumstantial validity2 estimates (11). Correlation 
of the scale with the pure factor has been esti- 
mated in two distinct ways in Table III: 


1. Taking the known factor loadings (actually 
reference vector correlations) of the items (cor- 
rected upward in the case where we have correla- 
tions only with imperfect estimates of the factor), 
and the correlations among the items, and calcu- 
lating the multiple correlation of the scale with 
the factor according to the formula (16): 


where n is the number of items in the scale (in 
this case 10), rgj is the correlation of an item 
with a factor (from the factor matrix), and rjk 

the correlation of items with one another (ob- 
tained here economically in 10 x 10 matrices). 
This assumes equal variance on items (app rox- 


identify of a scale with a factor is established without seeing the two together but only through the de- 


gree to which they have similar relations with all else which they encounter. 


ajeos Img aouaTeatnbsa 


Wea] 


SGOHLAW OML Ad ‘SALLIGIIVA ATVOS 
Il 


103 ‘,,aTeds [[MJ,, 0} ST STULs 


s9° ajeos TeAIND| 


fa 
° 
° 
4 
< 
3) 


22" Ajuo Anuaystsuog 
9¢° aTeos Anuaystsuog 


da Vv 


FOO AONATVAINGT ANV ‘ALITIGVLS ‘AONALSISNOD ‘SaLLIMAVITAY ATVOS 
Il 


79 
OR 
mo ive) 
rs) + 
23S F8 82 
On NO 
ND 
Zinn oo wsisft 3c 
| 
mw Qn ~ ow 
NO MO 
AlSa Qa 2s 38 
Or Oe 
NO 
a woo 
ao oo 
ro 


80 JOURNAL OF EXPERIMENTAL EDUCATION 


imate here) and equal weight, which our scoring 
system gives. 

2. Taking the square root of the equivalence 
coefficient. This is approximate because it as- 
sumes that complete suppressor action has been 
achieved, so that no unwanted common factor var- 
iance is shared by the A and B forms, but only 
two specifics. However, appreciable par allel- 
ism of results from these two approaches—show- 
ing higher validities for factors B,C, H, O and ~ 
Q4, and lower for A, E, D, I, Qg and Q4—sug- 
gests that the approximations are minor. The 
tendency of (1) to run consistently higher than (2) 
in Table III is discussed below. 


Circumstantial or background concept validity 
has not previously been calculated in psychomet- 
rics. Its definition includes the delimitation of 
the universe of important psychological concepts 
within personality, quite distinct from the con- 
cept in the factor scale in question, that shoula 
correlate in the stated, expected ways. We have 
taken the universe as that of all other known per- 
sonality factors (including intelligence) in the 
later childhood age range. The cosine matrix 
((a) in TableIV) shows the correlation matrix 
among the pure factors, as obtained by the orig- 
inal simple structure rotation in our factoriza- 
tion, while (b) shows the correlations among the 
actual scales (on mean of two samples totalling 
296 cases, both A and B forms). 

The proposed Index of Circumstantial Validity 
is the correlation of a column of (a) withacorre- 
sponding column of (b), i.e., the extent to which 
the given scale’s correlations withother person- 
ality and ability scales agrees with the pure fac- 
tor’s correlations with the pure concepts behind 
these scales. Admittedly, this is imperfect, in 
that it penalizes a scale not only for its own in- 
validity but also for the invalidity prevailing in 
the sample of scales against which it is correlat- 
ed. Ideally one would wish to correlate each 
scale with the pure measures of the other scales 
by including it in the factored variables, but, ‘in 
a sufficient sample, at least the scales are likely 
to be put in correct order of validity. This or- 
der—the order of the r’s in Table V—indeed 
proves positively in agreement (rank r = +0. 32) 
with the order of direct validities as set out in 
Table III, though, as the constituent’s r’s are 
based on only 13 numbers the rank r could well 
vary appreciably about this value. However, we 
note that factors C, H, O, and Qq4 tend to have 
higher values on both direct and indirect valid- 
ities, while A, J, Q2, and Q3 tend to run lower 
on both, consistent with other indications. For 
test users the important finding here is that the 
circumstantial validities are positive and sub- 
tantial and independently support the evidence of 
the direct validities. 


7. Standard Scores and Standard Interpretation 


of Factors 


As pointed out above and elsewhere (9) it does 
not suffice, nowadays, to identify and name a 
questionnaire factor from the ‘‘face validity’’ of 
the questions which enter into it. It must be identi- 
fied either directly by correlation witha behavior 
rating (L-data) factor, or by correlation with a 
questionnaire factor previously thus identified by 
criteria. 

It was planned in this case to administer the 
HSPQ along with the recently checked and intensi- 
fied 16 P. F. (7,8,18) toa group of children 
(a) at an age in the border range of overlap be- 
tween the two scales, namely at 16 and 17 years, 
and (b) also at an early or middle age in the HSPQ 
range, 12 and 13, but to a group of children of 
above average intelligence, so that they would 
have no difficulties of comprehension with the 
16 P.F. test. The latter would bring the benefit 
of both “n independent sample and of a check on 
the effect of age upon factor meaning. It is ob- 
viously important in such cross identification to 
use the full length battery on both sides, for the 
reliability of a single factor measurement on one 
form only is such that the agreement of one factor 
with itself (across two forms) might not be reli- 
ably distinguishable from the correlation normal- 
ly existing between one oblique factor and another 
(when sampling error is added to both correla- 
tions). 

Since the agreement between the 16 P. F. cor- 
relations from the two samples (one of 175 cases, 
the other of 121) is represented by a correlation 
of 0.7, their results are combined here, for econ- 
omy of presentation, into a single set ofmeanr’s 
as shown in Table VI, which, for simplicity, re- 
cords only values derived when both r’s were sig- 
nificant beyond the P = .01 level inthe two origin- 
al contributing tables. It will be seen that there 
are several correlations other than those (under- 
lined) making the match itself which are statis- 
tically significant. These, e.g., the r’s of Q4 
with C, O and Q3, are mainly the usual signifi- 
cant correlations found among these oblique fac- 
tors (compare Table IV) forming the second- 
order anxiety factor (9:319), but two aberrations 
will be discussed ina moment. The decision on 
matching and identifying as indicated by the let- 
ters in Table VI rests on (1) the matching r’s in 
question being significant beyond the P= .001 
level, (2) the r’s also being simultaneously the 
highest in the row and inthecolumn, i.e., the best 
for the ‘identified’ factor approaching from either 
scale, (3) the r’s being essentially of the same 
magnitude as the equivalence coefficients within 
one scale (Table II anu ref., 18). It will further 
be noted that the majority of identifications agree 
with those independently, but tentatively, made 


CATTELL-COAN-BELOFF 


TABLE IV 


HSPQ FACTOR INTERCORRELATIONS: BASIS FOR ESTIMATION OF INDIRECT 
VALIDITY COEFFICIENTS 


(b) Correlations among Pure Factors (Simple Structure) 


A Cc D E F G H 


05 
-14 -22 
-10 -15 29 

05 00 28 36 

14 -10 -15 -34 -31 
38 19 -44 -47 05 
22 -21 -04 -26 -14 
-11 -10 -15 -12 -07 
-04 -14 09 22 -06 
08 00 25 -12 -06 
-08 20 -33 -38 -25 
-16 -07 24 15 -05 


(b) Correlations among Factor Scale Raw Scores 
A B Cc D F G 


13 

12 -02 
-17 -05 -20 

-22 -13 -12 19 

07 -03 02 05 09 

18 18 14 -21 -24 

21 -02 35 -29 -16 10 

10 03 -15 -05 -21 13 -05 
-00 ll -12 -02 -10 25 -08 17 

19 -01 -33 22 16 -15 -42 07 08 

07 06 08 -02 -05 17 03 01 12 -07 
16 14 ll -12 -18 23 12 11 14 -19 
-15 -03 -31 30 14 -14 -32 07 09 33 


On 


Note: Correlations in (a) based on 168 twelve- and thirteen-year-old boys and girls. 
Correlations in (b) based on 296 twelve- through fourteen-year-old boys and girls. 


81 

I J oO @ %4 
36 
21 26 
01 -07 14 
-22 -22 -11 -07 
Q2 15 03 09 -06 -22 
Q3 20 27 19 08 -10 06 
Q4 -33 -43 -09 18 14 -20 -07 
Be H I J oO Q2 Q3 Q4 
-07 -13 


*Sased 967 UO peseg 
389} [MJ 0} papuajxe ueym 1G Jo pue ae 


96° 


H 5 
oT 


OdSH 


HLO@ NO SA'IVOS HLDNAT T1N4) 
SUOLOVA 91 JO SNUAL NI SHOLOVA OdSH JO NOLLVOIALLNACI 


IA 


SATVOS YHOLOVA OdSH HOA ALIGIIVA LOFUIGNI ‘TVLLNVLSWNOUID JO 
A 


82 
| 
a 
G| 2 g 
g 
NO 
5 x 
e . 
a 
: : 
a| 
ad > 
wa} 3 
als & 
. 
o| 
3| a 
< : 
aq 
zs 


CATTELL-COAN-BELOFF 83 


in the original work on the JPQ (11). 

This table, however, brings out rather poor 
definition of A (agreeing with itslow reliability) 
—though as good, incidentally, as for the veter- 
an intelligence factor B, on the same fewness of 
items. G and Q3 are also weak, and again in 
rough agreement with the reliability evidence. 
However, in the case of D and Q3 the peculiar 
situation arises that the highest r in the row for 
16 P. F. factor Q3 is D, not Q3. We have exam- 
ined this problem in the light of much more evi- 
dence than can be set out here, notably the ef- 
fects of reliability differences and the indirect 
validities, and have come to the conclusion that 
D and Q3 are probably incompletely separated in 
the HSPQ and that Q3 in the 16 P. F. has a good 
deal of D(-) in it. Psychologically, the differ- 
ence is that D is excitability, while Q3 is seif- 
control, the former probably being tempera- 
mental and genetic, and the latter connected 
with the rise of the self-sentiment. Special re- 
search will be needed to separate them in the 
adult, where they seem to behave virtually as 
face and obverse of the same behavior. It will 
It will be noted that M, N and J have no correla- 
tions with the other scales equal to their corre- 
lations with themselves, which adds to the proof 
of their separateness (Table II; ref., 18) from 
any factor in the other battery. Ifthe identifying 
correlations with the same factors in the other 
scale are sometimes (A, B, Q2, Q3) rather low- 
er than the equivalences in the same scale, it 
must be remembered that in the age group chosen 
in the no-man’s-land necessary to reach both 
scales, factor B and Q2 in the HSPQ are too 
easy and childish, respectively, while 16 P. F. 
factor B is too hard and factor Q9 a little too so- 
phisticated in activity reference. Obviously, 
however, further intensification of the HSPQ fac- 
tors is still necessary, and is being carried out 
to get still clearer 16 P. F. identification and 
mutual separation. 

The standardization of the HSPQ thus labelled 
as to its factors, and with factor B (intelligence) 
added, was carried out with the following objec- 
tives: 


1. To take American high school pupils, ages 
12 through 16 as the reference population. 

2. To standardize separately for boys and 
girls. For, as the results in Table VI show, 
consonantly with those of the adult 16 PF, there 
are significant sex differences on about half of 
the personality factors. On the other hand, age 
trends are significant (Table VII) only on three 
factors, and for this reason, to avoid m ul tipli- 
cation of standardization tables, it is proposed 
that in these cases the raw scores be simply 
corrected for age before entering the main 


tables. 

3. To have norms available both in point 
scores (in this case stens) and in ranks (in this 
case deciles). Stens are preferred to stanines 
by most users, apparently because use of the dec- 
imal system has accustomed us to ten point scales; 
but the provided mean and sigma of raw scores 
in the tables enables stanines also to be readily 
calculatea when desired. The stens are standard 
stens (18), that is to say, they donotliterally rep- 
resent half sigma units along the obtained distri- 
bution, but units fitting the areas normally cov- 
vered by half sigma when the obtained distr ibu- 
tion is made into a normal curve, by expanding 
or contracting the raw score scale on the base 
line to produce a normal curve. In other words, 
sten and decile values translate into one another 
according to the standard relations of the normal 
curve. 

4. To have norms both for single forms A and 
B, and for the two equivalent forms when used to- 
gether. For, due to lack of perfect correlation 
of the forms, the standard score on A and B will 
not be literally the mean of that obtained from the 
two parts. This three-fold presentation, tog eth- 
er with sex differentiation and sten and decile al- 
ternatives results in twelve (3 x 2 x 2) norm tables, 
which have been supplied with the Handbook (11). 


It is of theoretical interest that the larger sex 
differences above—indicating boys to be more 
schizothyme and of greater ego strength, and 
girls to be decidedly more premsic andof greater 
self-sentiment development—agree with those 
found for adults (18), though the greater domin- 
ance and lesser guilt proneness (0 factor) found 
for adult men is only found in boys here at a low- 
er level of significance. On the other hand, the 
age trends are not so consistent, for sel f-sen- 
timent control is falling here, in early adoles- 
cence, whereas it rises slightly through post-ad- 
olescent life. There is agreement, however, on 
increase in ego strength and decrease in ergic 
tension throughout the life course. Age correc- 
tions are finally recommended in the Handbook 
only for B, Intelligence, and Q3, self-sentim ent 
control. 

The present initial norm tables (in skeleton 
form in Table VIII) are based on asample of boys 
and girls, from 12 through 18 years ofage, gath- 
ered from 17 schools in different parts of the 
country, but mainly from middle sized towns in 
the midwest states and Texas. As the values in 
Table VIII show, we have ingeneral been success- 
ful in choosing an extremity-cut on items such 
that the mean raw score occupies the approxi- 
mate center of the possible raw score range, 
while three times the sigma, each way, spreads 
out to cover the possible raw score range. 


JOURNAL OF EXPERIMENTAL EDUC ATION 


TABLE VII 
SIGNIFICANT AGE AND SEX DIFFERENCES ON PERSONALITY FAC TORS 


1. Age Trends* 
Factor T Differences P Value (Double-tailed) 


Intelligence 3.57 Older higher .001 
Ego-strength  ,81 Older higher Not significant 
Dominance -69 Older higher. Not significant 
Super Ego Strength 1.22 Older higher Not significant 
Coasthenia 2.32 Older higher .02 

Self Sentiment Control 3.70 Older lower . 001 
Ergic -71 Older lower Not significant 


B 
Cc 
E 
G 
J 
Q3 
Q4 


*This part of table is restricted to factors in which the trend is the same on the A and B 
forms. t values are for the complete battery score, on 500 cases. 


2. Sex Differences 
Factor T Differences 


Cyclothymia 4.66 Girls higher 
Ego strength 5.65 Boys higher 
Surgency 3.33 Girls higher 
Super Ego Strength 3.63 Girls higher 
Parmia 2.70 Boys higher 
Premsia 12.14 Girls higher 
Coasthenia 5.55 Girls higher 
3 Self Sentiment Control 2.96 Girls higher 


Note: From complete battery (A and B) scores; 333 cases. 


84 
01 

01 
01 
01 
01 
01 
01 


CATTELL-COAN-BELOFF 


Factor 


n 
< 
| 
> 
& 
al 
v4 
oO 


Note: Based on 1089 boys and girls aged 12 through 18 years. 


Form A Mean 


Sigma 


Form B Mean 


Sigma 


85 
oo or 
Glen 
Glen 6H 
ar 
woe 
NO 
° 
wore oe 
mo or~ 
oe 
mon woe 
=) 
TN 
orm 
Orn noe 
TD 
Te 
or~ Tr 
Q Nagi 
te 
oe oo 
iS) ee 
ore Tre 
coe 
on on 
Ne 
< 
ore 


JOURNAL OF EXPERIMENTAL EDUCATION 


DIAGRAM I 


THE DIFFERENCE BETWEEN VALIDITY AND HOMOGENEITY 
WHEN SUPPRESSOR ACTION IS INVOLVED 


86 
/ \ 
/ | | \ 
/ | | \ 
| 
\ \ 
| | \ 
| 
\ o0/ ——» 
\ | 


CATTELL-COAN-BELOFF 87 


8. Discussion: The Relative Value to be As- 
signed to Reliability and Validity Coefficients 


For school psychologists accustomed to ex- 
pect reliabilities of 0.95 and upward (at leastin 
test catalogues!) for achievement tests, or in 
very specific attitude questionnaires, some un- 
derstanding is required of the changing im por- 
tance of reliability and validity as one turns to 
personality tests. On the whole, the present 
multi-dimensional scale presents validities (per 
ten items) higher than have hitherto been 
achieved in the personality field; but the consis- 
tency (homogeneity) and equivalence (A and B 
forms, or with the 16 P. F.) coefficients are low 
relative to the stability and validity coefficients, 
in terms of experience in other areas. 

The main impression may perhaps be sum- 
marized in the observation that reliability does 
not exceed validity to the degree one expects 
from experience with specific educational tests. 
Now elsewhere (9), we have tentatively stat- 
ed the principle that there is atendency for most 
everyday life behavior (as in the content of ques- 
tionnaires) to be factorially complex when it 
(a) really ‘‘involves’’ a lot of personality, i.e., 
has desirably high personality factor loading 
communality, and, therefore, (b) avoids suc- 
cessfully that degree of specificity of item situ- 
ation which would make it of unstable meaning 
from sample to sample and sub-culture to sub- 
culture. Apropos of the latter, it has been the 
experience of many psychologists that highly 
homogeneous scales, whether obtained by the 
techniques of Walker (25) and Guttman (22), or 
other devices for high internal consistency 
(which lead to virtually rewriting the same item 
in many near-synonymous forms!) are generally 
(a) far more likely to deal with psychologically 
very narrow specific interests, etc., thanbroad 
primary personality factors suchas general per- 
sonality theory recognizes, and (b) more liable 
to show, from sample to sample, and from one 
testing situation to another, an instability in the 
loadings of whatever broad personality factor 
they do contain. It is as if the specificity char- 
acteristic of a mere single item as well as this 
single item’s instability with sub-culture and 
testing situation, were multiplied in such highly 
homogeneous scales to cover the whole scale. 

If this principle is correct, good personality 
factor scales are most likely to be obtained by 
using factorially complex items, which must 
therefore use suppressor action, i.e., an attempt 
to balance on unwanted common factors, to give 
a scale for a single factor. As a consequence, 
(a) homogeneity will tend to be low, and (b) we 
must get accustomed to using tests with more 
items than ten or a dozen if it is judged desir- 
able to obtain the consistency - reliabilities 
which are easily reached with comparatively few 


items in the specific-factor ‘‘homogeneous’’ type 
of test. In other words, instead of sacrificing es- 
sential validity to a show of high consistency, and 
high equivalence-reliability coefficients we should 
do better to choose our tests more by their real 
factor validity coefficients (concept validity), and 
gain high consistency and equivalence coefficients 
additionally, ifdesired, by lengthened tests. 

This point can be quickly and graphically il lus- 
trated by Diagram 1, which shows the basic situa- 
tion in suppressor action. Itemsa and beach cor- 
relate + .7 with the required factor F, but are 
chosen to have opposite sign loadings (+ .7 and 
-.7) on the unwanted factor U. If these were A 
and B forms (A and B forms could be made easily 
enough by multiplying such items) of a test, they 
would actually have an equivalence reliability co- 
efficient of zero. At the same time they would 
have separate validities of 0.7 and a combined 
validity of 1.0. This is notatrick case. The 
more realistically-complex experimental situa- 
tion which commonly exists, comprising specifics, 
error, and more than one unwanted common fac- 
tor, would merely complicate the form of the cal- 
culation but leave the principle still operative. 
However, the systematic trendnoted inour Table 
Ill for validity calculated from internal loadings 
(multiple r) to be greater than that calculated 
from equivalent form correlations shows that low 
homogeneity through widespread suppressor ac- 
tion is not the whole problem. Comparison of our 
HSPQ and 16 P. F. results suggests additionally 
that research should examine the hypothesis that 
in children there is appreciably greater function 
fluctuation on traits and lower dependability on 
items. (I the validity from the second main row 
of Table Il is corrected for attenuation by the 
test-retest error represented in the second main 
row of Table Il, it rises to at least the internally 
calculated validity of row one in Table III.) 

Accordingly, if we wish to measure fourteen 
major personality dimensions in a forty-minute 
test, with resultant cut to 10 items per factor, the 
best research on test construction at this time can 
only produce reliabilities (equivalence or c onsis- 
tency coefficients) in the thirties—but validities 
in the sixties. Since the purpose of good reliabil- 
ity is to make good validity possible, we should 
welcome this order of coefficients in preference 
to the converse! However, since the maximum 
possible validity for a given type of instrument is 
always desirable, and something above the sixties 
is to be preferred, one must strongly urge that 
the experimental research now ; ossible on per- 
sonality in children through employing these fac- 
tor scales, should: (a) use both forms of the test, 
i.e., at least 20 items per factor, and (b) take 
greater care than in the ordinary administration 
of questionnaires to adults, specifically to reduce 
fatigue, and to sustain motivation and carefulness. 

The writers wish to express their gratitude to 


88 JOURNAL OF EXPERIMENTAL EDUCATION 


Drs. Tollefson and Rutherford Porter for careful 
collation of results and to those psychologists 
and teachers in Illinois, Indiana, and Texas 
whose cooperation brought the work to success- 
ful completion. 


BIBLIOGRAPHY 


1. Baggaley, A.R. and Cattell, R.B. ‘‘Exact 
and Approximate Linear Function Esti- 
mates of Oblique Factor Scores for Indi- 
viduals,’’ British Journal of Statistical 
Psychology, LX (1956), pp. 12-21. 

. Bargmann, R. ‘*Signifikanzuntersuchungen 
in der Einfachen Struktur in der Faktoren- 
Analyse,’’ Mitteilungsblatt fur Mathema- 
tische Statistik (Sonderdruck, Wurzburg: 
Physica-Verlag, 1954). 

. Berg, I. A. and Rapaport, G.M. ‘‘Response 
Sets in a Multiple Choice Test,’’ Educa- 
tional and Psychological Measurements, 
XV (1955), pp. 58-62. 

. Cattell, R. B. The Cattell Group Intelli- 
gence Tests, Scales C, I, Il, and Ill: For 
4 Years to Adulthood (London: Harrap, 
1930). 

. Cattell, R. B. The Description and Meas- 
urement of Personality (New York: World 
Book Co., 1946). 

. Cattell, R. B. Factor Analysis (New York: 
Harper Brothers, 1952). 

. Cattell, R. B. ‘‘A Shortened ‘Basic Eng - 
lish’ Version (Form C) of the 16 P. F. 
Questionnaire, ’’ Journal of Social Psychol- 
ogy, XLIV (1946), pp. 257-78. 

. Cattell, R. B. ‘‘Validation and Intensifica- 
tion of the Sixteen Personality Factor Ques- 
tionnaire, ’’ Journal of Clinical Psychology, 
XII (1956), pp. 205-14. 

. Cattell, R. B. Personality and Motivation 
Structure and Measurement (New York: 
World Book Co., 1957). 

. Cattell, R. B. ‘‘Formulae and Table for 
Obtaining Validities and Reliabilities of Ex- 
tended Factor Scales,’’ Educational and 
Psychological Measurement, IV (1957), pp. 
491-98. 

. Cattell, R. B. The High School Personality 
Questionnaire (Champaign, Ill.: Institute 
for Personality and Ability Testing, 1604 
Coronado Drive, 1958). 

. Cattell, R. B. and Beloff, H. ‘‘Research 
Origin and Construction of the IPAT Jun- 
ior Personality Quiz,’’ Journal of Consult- 
ing Psychology, VI (1953), pp. 436-42. 


13. Cattell, R. B., Blewett, D. B., and Beloff, 


J. R. ‘‘The Inheritance of Personality: A 
Multiple Variance Analysis Determination 
of Approximate Nature-Nurture Ratios, 
for Primary Personality Factors in 
Q-data,’’ American Journal of Human Ge- 
netics, VII (1955), pp. 122-46. 


. Cattell, R. B. and Coan, R. W. ‘‘Person- 


ality Factors in Middle Childhood as Re- 
vealed in Parent’s Ratings, ’’ Child Develop- 
ment, XXVIII (1957), pp. 439-58. 


. Cattell, R. B. andGruen, W. ‘Primary Per- 


sonality Factors in the Questionnaire Me- 
dium for Children, 11-14 Years Old,’’ Edu- 
cational and Psychological Measurements, 
XIV (1954), pp. 50-76. 


. Cattell, R. B. and Radcliffe, J. ‘‘Reliabil- 


ities and Validities of Simple and Extended 
Weighteu and Buffered Unifactor Scales,”’ 
British Journal of Statistical Psychol ogy 
(in press). 


. Cattell, R. B. and Saunders, D. R. ‘‘Beitrage 


zur Faktoren-Analyse der Personlichkeit, ’’ 


Zeitschrift fur Experimentelle und Ange- 
wandte Psychologie, 1173 (1955), pp.325- ST. 


. Cattell, R. B. and Stice, G. R. The Six- 


teen Personality Factor Questionnaire, Re- 
visea Edition (Champaign, I1l.: Institute for 
Personality and Ability Testing, 1604 Coro- 
nado Drive, 1957). 


: Cronbach, L. J. ‘‘Coefficient Alpha ana the 


Internal Structure of Tests,’’ Psyc homet- 
rika, XVI (1951), pp. 297-334. 


. Cronbach, L. J. ‘‘Further Evidence on Re- 


sponse Sets and Test Design, ’’ Educational 
and Psychological Measurements, x (1950), 
pp. 3-31. 


. French, J. W. The Description of Personal- 


ity Measurements in Terms of Rotated Fac- 
tors (Princeton, N. J.: Educational Testing 
Service, 1953). 


. Guttman, L. ‘‘A Basis for Scaling Qual ita- 


tive Data,’’ American Sociological Review, 
IX (1944), pp. 139-50. 


. O’Halloran, Ann. An Investigation of Person- 


ality Factors Associated with Achievement 
in Arithmetic and Reading, Masters Thesis, 
Purdue University, 1954. 


. Thurstone, L. L. Multiple Factor Analysis: 


A Development and Expansion of the Vectors 
of the Mind (Chicago: University of Chicago 
Press, 1947). 


25. Walker, D. A. ‘‘Answer-Pattern and Score 


Scatter in Tests and Examinations,’ British 
Journal of Psychology, XXII (1931), pp. 73- 
86. 


15 
1 6 
17 
20 
| 
24 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, December 1958) 


THE BEHAVIOR OF TEACHERS AND THE PRODUCTIVE 
BEHAVIOR OF THEIR PUPILS: 
“PERCEPTION” ANALYSIS’ 


MORRIS L. COGAN 
Harvard University 


Introduction 


The Measurement of Competence 


THE RECENT history of educationinthe 
United States is marked by numerous attempts 
to evaluate the competence of teachers. In view 
of the importance of formal education in contem- 
porary society, such a preoccupation is readily 
understood. The findings of competence studies 
have, however, been inconsistent and unconvinc- 
ing. Many reasons may be advanced to account 
for this slow progress in the study of the teach- 
er’s effectiveness. Foremost among these is 
the fact that there is little agreement ona basic 
definition of the good teacher. Under such con- 
ditions, when fundamental issues remain unre- 
solved, it is almost inevitable that the results of 
even the most rigorous research should be se- 
verely criticized. Such conflicts, essentially 
philosophical in nature, will not be resolved by 
this present study. The purpose of this re- 
search is to provide some objective evidence up- 
on which ultimate value judgments may be based. 

In many competence studies, the criterion 
measures have been defined as the opinions of 
experts, supervisors, and principals. The weak- 
ness of such measures is that logically and ex- 
perimentally it has been relatively easy to dem- 
onstrate that they are often not significantly re- 
lated to pupil change, growth, development—al- 
though these variables are generally recognized 
as valid criteria of teacher competence. 

On the other hand, when pupil change has been 
adopted as the criterion measure and has _ been 
operationally defined, the usefulness of the find- 
ings is severely restricted by limitations in the 
instruments and techniques at present available 
for the measurement of subject-matter achieve- 
ment, of growth in social and learning skills, 
and of changes in attitudes. To these two com- 
plications a third may be added: the difficulty of 


* All footnotes will be found at end of article. 


identifying a specific teacher to whom such 

Changes can be attributed—a problem that be- 
comes especially acute in the departmentalized 
grades. 

The focus of the present study is an investiga- 
tion of the relationships between certain specific, 
observable behaviors of teachers and the amounts 
of required work and class-related self-initiated 
work performed by their pupils. Three crucial 
factors are involved in the research design. The 
first is that the criterion measures are taken in 
terms of the amount of work performed by the 
pupils. Such consequent measures avoid the dis- 
advantages of ratings by principals and supervis- 
ors. They fall short, however, of measuring pu- 
pil change. Nevertheless, it is felt that pupil 
work is very closely related to pupil change in 
the learning sequences of the classroom. If it is 
at present impracticable to measure pupil change, 
then the measurement of pupil work as the vari- 
able intervening just prior to such change may be 
a productive concept. 

A second important element of the research is 
the use of specific, clearly defined classroom be- 
haviors of teachers as the independent variables, 
in c ontradistinction to the fairly common use of 
a sort of global variable called ‘‘competence’’. 

The third major factor in the design of this 
study is its reliance upon the reports of pupils as 
the most important source of data concerning 
their work and the behaviors of their teachers. 
Although the teachers’ ratings of the pupils’ work 
and the principals’ reports on the behaviors of 
their teachers are both included in the data col- 
lected, the primary emphasis is uponthe data se- 
cured from pupils, since they arein an excellent 
position to report on their own work and on the be- 
haviors of their teachers. 


' The Theory and the Variables 


The dependent variables of the study are (1) 
the amount of required work performed by the 


90 JOURNAL OF EXPERIMENTAL EDUCATION 


pupils, and (2) the amount of self-initiated work 
performed by the pupils. These scores are ob- 
tained through the responses to a questionnaire 
called the ‘‘Pupil Survey’’. 

The required work score is secured by pre- 
senting to the pupils a list of 30 of the most com- 
mon types of assignments, on which they are 
asked to report on their work with a specified 
teacher. The scales and their scored values 
are: (0) this kind of homework is not given in 
this subject; when it is given I do it (1) almost 
never, (2) few times, (3) sometimes, (4) many 
times, (5) almost always. Some illustrations 
of the kinds of homework items provided in the 
‘*Survey’’ are: do drill exercises; memorize 
rules; solve number problems; andprepare a de- 
bate. 

The self-initiated score is derived from the 
pupils’ responses to 25 items dealing with com- 
mon, school-related activities. The items name 
some of the voluntary activities commonly en- 
countered in secondary schools, for example, 
performing extra exercises, making extra mod- 
els, charts; making visits to museums; doing 
extra experiments. A six-point frequency scale 
is provided with each item, running from “‘I 
never do this’’, to “‘I do it very often’’. 

The theoretical basis for the reliance of the 
study upon the pupils’ required and self-initiated 
work rests upon two demonstrations: (1) that 
such work is a necessary pre-condition for most 
school learning, and (2) that such work is prox- 
imate to pupil change. The process by which 
teacher behaviors are related to pupil change, 
and the relationship of pupil work to pupil change 
may be expressed as follows: 

The behavior of teachers as perceived by the 
pupils influences the nature and extent of (1) the 
motivation of pupils, (2) communication with 
pupils, and (3) the ‘‘tone’’ of the classroom ex- 
periences, which may instigate certain pupil 
work resulting in pupil change. 

The teacher behaviors represented as the 
first in the train of events leading to pupil 
change constitute the independent events of the 
hypotheses. The theory ofthe effects of certain 
kinds of teacher behavior upon the pupils’ work 
is derived largely from the social learning con- 
cepts of Miller and Dollard. ! 

These writers have developed strong evi- 

ence for the occurrence of processes where- 
by, if their conclusions may be generalized to 
the classroom situation, the teacher may become 
on the one hand a cue for anxiety or, on the other, 
for ‘‘liking’’ or ‘‘respect’’. An appropriate re- 
sponse to anxiety is avoidance of some sort; an 
appropriate response to liking is approach. Thus 
the teacher who becomes acue for strong anxiety 
will motivate his pupils to an acceptable mini- 
mum of required work; i.e., the pupils will use 
the most expeditious means of avoiding an anxi- 


ety-laden stimulus. They will, in addition, tend 
to perform very little self-initiated work, since 
this would be the symbolic equivalent of remain- 
ing longer than absolutely necessary in proximity 
to an unpleasant situation. Onthe other hand, the 
concept of a gradient of approach suggests that pu- 
pils will perform much more self-initiated work 
for the teacher who becomes a cue for approach. 
Those actions tending to make the teacher a cue 
for approach are termed ‘‘inclusive’’; those that 
tend to make him a cue for avoidance are termed 
*‘preclusive’’. 

The third independent variable has been called 
‘‘conjunctive’’. Although this designation is sug- 
gested by H. A. Murray’s2 terminology, it is here 
used to describe actions very different from those 
he envisioned. ‘‘Conjunctive’’ refers to those be- 
haviors of the teacher which give indications of 
(1) his ability'to communicate with his pupils; 
(2) his efficiency in classroom management; (3) 
his command of, and creativity in dealing with, 
his subject matter; and (4) the level of his demands 
upon the pupils. These behaviors are very much 
less affect-laden than those of the inclusive and 
preclusive categories; they are, nevertheless, 
considered to be a major factor in the teaching- 
learning process. 

The specific behaviors which tendto cause the 
pupils to perceive the teacher as inclusive, pre- 
clusive, or conjunctive have been drawn from the 
work of Murray®; Lewin, Lippitt, and White*; An- 
aerson, Brewer, and Reed®; Cattell§; andfrom 
the writer’s observation and experience. 

Some indications of the organization of the in- 
dependent variables, together with illustrative 
items for each, may serve both to clarify the the- 
oretical framework and to exemplify the method 
by which scores were obtained for each variable: 


I. Inclusive 
Integrative 
B. Affiliative 
C. Nurturant 


Preclusive 

A. Dominative 
B. Aggressive 
C. Rejectant 


Conjunctive 

A. Indicating level of demand 

B. Indicating ability to communicate 

C. Indicating competence in classroom man- 
agement 


Items (5-point frequency scales providedfor each 
tem): 
ressive (item 30): This teacher shouts 
or yells at us. 


Rejectant (item 44): This teacher says that 
cer pupils ought not to be in this class. 


Integrative (item 33): When we start new 
work, this teacher helps us to see why 
this work is important to all of us. 

Affiliative (item 29): This teacher is 

riendly. 

Level of demand (item 57): This teacher 
requires pupils to do work that is cor- 
rect and in good order. 

Ability to communicate (item 54): This 
teacher explains things so Ican under- 
stand them. 


The Sample and Some Hypotheses 


The population sampled consists of the public 
school teachers in departmentalized secondary 
schools in communities located withinthe metro- 
politan Boston area. From this population two 
communities were selected which differed in so- 
cioeconomic characteristics. Among the socio- 
economic criteria employed were: (1) median 
school years completed by persons 25 years old 
and over; (2) median income; (3) percent of 
males, 25 years old and over, with four years 
of college education; and (4) percent of workers 
in various occupations ranging from profession- 
al, technical, and kindred workers to crafts- 
men and laborers. The two communities fin- 
ally selected exemplified sharply different socio- 
economic conditions. Thus it was hoped, if there 
was an organized ‘‘socioeconomic community ef- 
fect’’ on teachers, that such an effect would be- 
come evident. Where significant community ef- 
fects have been observed in this study, the find- 
ings are of course not generalizable to other 
communities within the population; on the other 
hand, the findings are more generalizable from 
analyses in which significant among-community 
differences are not observed. Since this is a 
first study, it is difficult to indicate what limits 
should be placed upon the results. Perhaps itis 
fair to say that teachers were selected from two 
communities differing in socioeconomic charac- 
teristics as sharply as possible, within the com- 
munities available to the researcher. 

The problem of generalizing from the within- 
teacher findings is another matter. In speaking 
of the measures observed for a single teacher, 
the test of significance is definitely based on a 
sampling of the pupils who are taught by the 
teacher in the course of his professional lifetime. 
For the within-teacher findings, therefore, gen- 
eralization to the pupils of the teacher involved 
is plausible; generalization topupils of teachers 
other than those in the sample may be possible 
if the findings about different teachers are in 
agreement, i.e., if some regularities among 
teachers become evident from the analysis of 
within-teacher data. Such generalizations, how- 
ever, would also be bound by the limits of the 
sampling plan discussed above. 


91 


Data were collected from 33 teachers, five 
principals, and 987 eighth grade pupils in five 
public junior high schools. The pupil sample at 
the eighth grade level waschosen in order to min- 
imize the selective factor of school dropouts, 
which begins to operate strongly in the more ad- 
vanced grades. An eighth grade sample is more 
representative of the total population of secondary 
school pupils in metropolitan Boston than is one 
drawn from higher grades. Nevertheless, one 
can generalize the findings of this study only with 
extreme caution to any grades other than eighth 
since we do not know how pupil perceptions of 
teachers vary with age. 

The choice of departmentalized schools was 
dictated mainly by the interests of the writer. 
There has been relatively little research of this 
kind done in such schools, and an approach to the 
question of isolating the influence of a single 
teacher among the many with whom secondary pu- 
pils customarily work seemed to offer a challen- 
ging problem. 

It is within this context that the present study 
has been made. It should be noted that the meas- 
ures involved are derived from the reports of the 
pupils. Some respresentative hypotheses follow. 


1. Preclusive behaviors of teachers are nega- 
tively related to the amount of self-initiated 
work performed by the pupils. 

2. Preclusive behaviors of teachers are nega- 
tively related to the amount of required 
work performed by the pupils. 

Conjunctive behaviors of teachers are posi- 
tively related to the amount of required 
work performed by the pupils. 

Conjunctive behaviors of teachers are posi- 
tively related to the amount of self-inita- 
ted work performed by the pupil. 

Inclusive behaviors of teachers are positive- 
ly related to the amount of self-initiated 
work performed by the pupils. 

Inclusive behaviors of teachers are positive- 
ly related to the amount of required work 
performed by thepupils, although this rela- 
tionship is weaker than that of inclusive be- 
haviors to self-initiated work. 


The ‘‘Perception’’ Analysis 


The central concern of this research is the in- 
vestigation of the relationships of three measures 
of teacher behaviors to two dependent variables— 
the measures of required and self-initiated work 
performed by pupils. The behaviors of the teach- 
ers are classified into three categories called in- 
clusive, preclusive, and conjunctive. 

The data have been analyzed from two differ- 
ent points of departure. The first of these is 
termed the perception analysis; the second, the 
trait analysis. s the name of the former would 


92 JOURNAL OF EXPERIMENTAL EDUCATION 


suggest, the perception analysis is an exam ina- 
tion of the relationships between the pupil’s per- 
ception of the teacher and the corresponding 
work scores of the pupil. This approachis prin- 
cipally concerned with the individual pupil’s in- 
terpretation of the teacher’s behavior. The an- 
alysis seeks an answer to the following question: 
Given certain teacher behaviors, what are the 
relationships between the pupil’s perception of 
such cues and the corresponding productivity 
scores? This relationship can be embodied in a 
correlation between the individual pupil’s percep- 
tion of a given teacher and his report of his re- 
quired or self-initiated work, this correlation 
being computed for all the pupils of a given teach- 
er (i.e., a ‘‘group’’ as here defined). The sig- 
nificance of such correlation is tested in two 
ways: (1) by application of the sign test for cor- 
relation coefficients of the independent and de- 
pendent measures over the 33 groups, and (2) by 
the t-test for the significance of each of these 
intra-group r’s (keeping in mind the arbitrary 
definition of this research, in which group de- 
notes all the pupils reporting ona single Seether). 
The basic data here are the raw scores of each 
pupil. 

The ‘‘trait’’ analysis, reported elsewhere, 
deals with the averages of the measures. In 
these statistical procedures the basic data are 
average scores of a group (therefore, ofa teach- 
er). Both the trait and the perception approaches 
are naturally combined in the analysis of var- 
iance with covariance adjustment. The covari- 
ance adjustment permits an assessment of among- 
group (trait) relationships, and within-group 
(perception) relationships. Inthis portion of the 
data, regression coefficients for inclusive and 
conjunctive scores, each by groups, are comput- 
ed and may be compared with corresponding re- 
quired and self-initiated work scores for each 
group, thus furnishing a picture of the covari- 
ance within each group (covariance in percep- 
tion), and the relationships among group means 
(covariance in traits).8 Although the sample in- 
cludes 987 pupils, the data are derived from 
1786 questionnaires, since each pupil was asked 
to report on two of three teachers in English, 
arithmetic, and science. A few pupils did not 
complete the second questionnaire. 


Standard Error of Measurement and 
Reliability of the Scales 


In the process of the computations for the an- 
alysis of variance mentioned above, an estimate 
of variance within groups was derivedfrom each 
variable. The square root of this error esti- 
mate provides a measure of standard error for 
each of the five scales. The reliability coeffi- 
cient for each scale is also computed for a single 
assessment ofateacher byapupil. The formula 


for the coefficient of a single assessment is 


where subscripts w and t denote the variance esti- 
mate within-teachers and the total variance esti- 
mate respectively. Table I presents the results 
of the computation. 

Inspection of the observed reliability coeffici- 
ents for the assessments of a single pupil shows 
all five coefficients to be quite small, indicating 
that the individual pupils’ ratings differ for differ- 
ent teachers. 


Correlational Analysis 


The next step in the statistical treatment of the 
data is an examination of the degree of relation- 
ship existing between the independent and the de- 
pendent variables. Since the problem is one of re- 
lationship between variables, the scores derived 
from the instruments are subjected to a correla- 
tional analysis. Two tests of significance are ap- 
plied to the coefficients of correlation: (1) the 
‘*sign’’ or ‘‘binomial series’’ test, and (2) the t- 
test of the significance of a product-moment coef- 
ficient of correlation as computed from a small 
random sample. 

The binomial series test is considered suita- 
ble in the present research because the explora- 
tory nature of the study makes it seem desirable 
to establish the presence or absence of a trend 
in the relationships among variables without ref- 
erence to the magnitude of the trend. The null hy- 
pothesis to be tested is ihato=0. If this is the 
case, then the sample r values will be different 
from zero solely due to chance, and hence one 
would expect to observe about as many positive 
as negative values. The statistic tobe used is the 
number of positive (or negative) signs. The ap- 
propriate test of the significance of this statistic 
is to enter the table of cumulative binomial prob- 
abilities for Q = P = .5 with the N of signs and m 
of negative (or positive) signs and read directly 
the probability that such a distribution could “wf 
attained by chance.9 The . 01 level of significance 
is adopted for the sign test. 

For the product moment coefficients of corre- 
lation computed for the relationships among the 
variables, the appropriate test of significance, as 
suggested by Lindquist, 19 is the t-test. In the 
present application, the .05 level of significance 
is adopted. 

Thus, by an analysis of the signs of the r’s, 
the significance of the general direction of the re- 
lationship is determined. The t-test shows which 
of the r’s are significant. 

The zero-order coefficients of correlation 
were computed for each group for all combin- 
ations of independent with dependent variables. 


- § ‘parmnbay - y - - - | 
:sjoquiAs ay} Aq pajuasaidas aie ay} UI adUaTUaAUOD Jo sasodind 10g, 


60 ‘ST LL‘ST SO 06 
09 “922 SS "S82 
% ost O€T 9} 92 92 


Té “ST 
Th 
SIT €2 


yusulssassy 
B 


10114 


Sa100g Jo adury 


s u d 


«I 


AHL JO ALITIGVITAN GNV LNANAUNSVAW AO HOUUT CUVANVLS 


I 


94 JOURNAL OF EXPERIMENTAL EDUCATION 


The intercorrelations among antecedents and 
among consequents were also obtained.!1 Each 
coefficient was computed twice, with additional 
cross-checks for accuracy. Raw-score data 
were utilized in the Pearson formula for the pro- 
duct moment correlation coefficient. ! 

The Preclusive Variable— The coefficients of 
correlation of preclusive with required scores 
are presented in Table II, Column PR. The co- 
efficients rpg show 18 negative signs and 15 
positive signs. Although the original for mula- 
tion of the problem anticipated a negative rela- 
tionship between the preclusive behaviors of the 
teacher and the required work of his pupils, the 
preponderance of negative signs is not signif- 
icant (P = .24). None of the positive coeffi- 
cients is significant, but four of the negative co- 
efficients attain significance. In view of the re- 
sults of the sign test and in view of the small 
number of significant r’s, it is concluded that 
the original hypothesis as to the relationship is 
not borne out by evidence of the ‘‘Pupil Survey”’. 
There is, on the other hand, the fact that the four 
significant r’s are negative. This relationship 
should not, perhaps, be completely discounted. 
It is the guess of the writer that the preclusive 
items were phrased in too extreme a manner, 
and referred to behaviors too outré to be dis- 
criminating. In any event, the evidence may not 
be so completely negative as to disc ourage fur- 
ther research in this area. 

The findings as to the relationship of preclu- 
sive behaviors to self-initiated work are very 
similar to those immediately above. Table II, 
Column PS, shows 17.5 negative and 15.5 posi- 
tive signs (if zero r’s are distributed equally in 
each category). The sign test shows no prepon- 
derance that can be attributed to factors other 
than chance (P = .36). Six coefficients are sig- 
nificant; five of these have negative signs as an- 
ticipated. 

The Conjunctive Variable—The conjunctive 
items are designed to provide a measure of the 
teacher’s (1) skill in classroom management, 
(2) level of demands upon his pupils, (3) ability 
to develop interest in the classroom experi - 
ences, and (4) ability to communicate with his 
pupils. It was hypothesized that the conjunctive 
score would be positively related to both c onse- 
quents, especially to the amount of required 
work performed. The results of the correlation- 
al analysis are summarized in Table Il, which 
also presents similar data for the inclusive var- 
iable, to be discussed in the next subsection. 

Of 33 coefficients rcs, two are negative in 
sign; both are non-significant. Ifthe sign test is 
applied to the distribution of positive and neg- 
ative correlation values, under the hypothesis 
that they are samples from apopulationin which 
the true correlation is zero, the probability of 
two negative signs and 31 positive signs is far 


less than .01, and the null hypothesis may be re- 
jected with confidence. Sixteen of the coefficients 
of conjunctive with self- initiated scores are sig 
nificant at or above the .05 level. 

Comparable tabulations of the coefficients of 
conjunctive with required scores show only one 
negative sign (r = -.06). The null hypothesis as 
to sign may be rejected with confidence. Seven- 
teen of the r’s are significant at or above the .05 
level. 

It is possible to say that the evidence of the 
two tests applied to the relation of the conjunctive 
to the required and self-initiated scores gives 
strong indication that the behaviors measured by 
the conjunctive items are perceived by the pupils 
in a manner indicating a fairly stable relationship 
between this variable and the consequents. No ev- 
idence is found to show that conjunctivity is more 
closely related to required than to self-initiated 
scores, a Significant rcg having been attained in 
16 groups, as compared to 17 groups in which 
rcR is significant. Nor does there appear to be 
any appreciable difference in the magnitudes of 
the r’s in the two arrays. 

If some account is taken of the attenuating fac- 
tors resident in the measuring instrument and its 
administration, there appears to be some reason 
for optimism concerning the value of further re- 
search in this area. Of the total of 66 r’s com- 
puted for the conjunctive variable with the criter- 
ion variables, half are significant. This fact 
might be interpreted to mean that the results of 
the present research could serve as a point of de- 
parture for further attempts to predict the pro- 
ductive behavior of pupils from their perception 
of the behaviors of their teachers. 

The Inclusive Variable—The relationships be- 
tween the inclusive variable andthe criterion var- 
iables appear to be more pronounced than those 
of any of the other antecedent-c onse quent vari- 
ables. Table III, in which the results of the cor- 
relation study are found, shows that the sign of 
the coefficient ry for inclusive with required 
scores wituin groups is negative in only two 
groups, and both r’s are non-significant (r = -.01; 

= -.02). Thirty coefficients are positive; one 
is zero. The null hypothesis may safely be 
rejected (P <.01). 

The t-test of significance applied to the ryR’s 
shows 21 significant coefficients among those for 
the 33 groups. Table IV summarizes the distri- 
bution of r’s, significant at or above the .05 lev- 
el, by school and by subject. 

The results in the science groups are especial- 
ly noteworthy, since the coefficients of all these 
groups are significant. The sign test is not sen- 
sitive with only four cases, since the probability 
of getting four similar signs by chance is .06. 
Another fact should be noted, however; the r’s 
for these same four groups are significant also 
in the correlation of inclusive with self-initiated 


TABLE Il 


COEFFICIENTS OF CORRELATION OF PRECLUSIVE SCORES WITH REQUIRED AND 
SELF-INITIATED SCORES, FOR EACH TEACHER 


Correlation 


Teacher 


— 
DANK 


I 

I 

I 

I 

I 

I 

I 

I 

I 

I 
0 
Il 
Il 
Ul 
Il 
I 
IV 
IV 
IV 
IV 
IV 
IV 
IV 
Vv 
Vv 
Vv 
Vv 
Vv 
Vv 
Vv 


Number of Significant r’s 
Number of Significant r’s with Same Sign 


* Significant at .05 level. 
**Significant at .01 level. 


COGAN 95 
School Subject n PR PS 
Eng. 46 .03 
Eng. 60 -.07 -.17 
Eng. 48 -. 28* -.40** 
Eng. 22 © -.28 -.16 
Eng. 55 .09 .03 
Eng. 29 .07 . 06 
Arith. 80 -.06 -.12 
Arith. 79 .09 -.18* 
Arith. 61 -.14 -. 38** 
Arith. 33 .18 oft 
Eng. 112 -.05 -.01 
Eng. 93 -16 .00 
Arith. 88 -.04 . 00 
Arith. 122 -.04 .07 
Eng. 24 -.37* -.22 
Eng. 72 -.13 -. 06 
Arith. 15 13 . 34 . 54* 
Arith. 17 83 . 04 
Sci. 18 111 -.09 ll 7 
Eng. 21 20 . 23 .08 
Eng. 22 21 -.18 . 30 
Eng. 26 41 .14 . 00 
Eng. 27 21 . 24 . 20 
Arith. 20 21 -.11 . 06 
Arith. 24 54 -.04 -.21 
Sci. 25 96 . 08 . 02 
Eng. 28 18 . 38 ~ 22 
Eng. 31 15 -.42 -.51* 
Eng. 33 19 -. 46* -.21 
Eng. 23 49 .05 -.08 
Arith. 32 72 -.07 -.03 
Sci. 29 54 .07 -. 06 
Sci. 30 54 -.22* -. 30* 
4 6 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE Ill 


COEFFICIENTS OF CORRELATION OF CONJUNCTIVE AND OF INCLUSIVE SCORES EACH WITH RE- 
QUIRED AND SELF-INITIATED SCORES, FOR FACH TEACHER 


Correlation 
Subject Teacher cs IR 


-19 


OW 1H Ww 


I 

I 

I 

I 

I 

I 

I 

I 

I 

I 
Il 
Il 
Ol 
IV 
IV 
IV 


Number of significant r’s 


* Significant at the .05 level. 
**Significant at the .01 level. 


96 } 
===. 
Eng. . 46 . 24* 
Eng. 60 .35** .30* .25* 
Eng. 48 51** 51** 44** 60** 
Eng. 22 . 36* 58** .47* 
Eng. 55 . 28* . 29* 
Eng. 29 . 26 .00 . 00 
Arith. 80 .19* . 08 34** 
Arith. 79 -.06 .08 .17 . 28** 
Arith. 10 33 . 26 .07 . 38* -18 
Eng. 13 112 16* 
Eng. 14 93 .09 
Arith. 11 88 .16 .05 . 28** . 26** 
Eng. 16 24 25 65** 62** 
Eng. 19 72 . 28** . 20* . 36** .42** 
Arith. 15 13 yy -.17 -.01 -.10 
Arith. 17 83 . 04 -10 . 26** . 36** 
Sci. 18 111 33** 46** 47** 
Eng. 21 20 .18 .18 . 48* 
Eng. 22 21 .37* .45* .59** 
Eng. 26 41 .01 06 15 
IV - Eng. 27 21 -.04 
IV Arith. *20 21 52** . 49* 
IV Arith. 24 54 da** . 28* . 20 38** 
IV Sci. 25 96 18* 33** 40** 
Vv Eng. 28 18 .02 .09 -.02 -.03 
Vv Eng. 31 15 59** 53* . 48* 
Vv Eng. 33 19 32 oot 50* . 36 
Eng. ‘23 49 .32* 39** 28* 35** 
Vv Sci. 29 54 45** . 46** 
Vv Sci. 30 54 35** 39** 49** 
17 16 21 25 


TABLE IV 


DISTRIBUTION OF COEFFICIENTS OF CORRELATION OF INCLUSIVE WITH REQUIRED 
SCORES, BY SCHOOLS AND BY SUBJECTS 


English Arithmetic Science 


No. of Total Significant Total Significant Total Significant 
Groups Groups r’s Groups r’s Groups r’s 


TABLE V 


DISTRIBUTION OF COEFFICIENTS OF CORRELATION OF INCLUSIVE WITH SELF- 
INITIATED SCORES, BY SCHOOLS AND BY SUBJECTS 


English Arithmetic Science 


No. of Total Significant Total Significant Total Significant 
Groups Groups r’s Groups r’s Groups ~ r’s 


10 
4 


COGAN 97 
ii 
I 10 6 4 4 2 
0 4 2 1 2 2 | 
Il 5 2 2 2 1 1 1 
IV 7 4 1 2 1 1 
V 7 4 3 1 1 2 2 
Total 33 18 11 11 6 4 4 
| 
I 5 4 3 
ll 2 2 2 2 
: I 5 2 2 2 1 1 1 
IV 7 4 2 2 1 1 1 
V 7 4 2 1 1 2 2 
Total 33 18 13 11 8 4 4 


98 JOURNAL OF EXPERIMENTAL EDUCATION 


work (see Table V). It is interesting to specu- 
late, therefore, on the possibility that subject 
differences are a reality, and that the relation- 
ships among the variables may be universal for 
science, but not for English or arithmetic. In 
these latter subjects, 61 and 54percent, respec- 
tively, of the ae 8 are significant, with 72 per- 
cent of the ryg's significant in both English and 
arithmetic. 

Some of the questions that come to mind are: 
(1) Is the nature of the subject-matter of science 
such that the behaviors of the teacher become a 
factor so preponderant as to override the attenu- 
ating factors found in the teaching of English and 
arithmetic? (2) Are there factors in the train- 
ing or the personality of science teachers that 
maximize the influence of their behaviors upon 
the pupils’ work? (3)Is the attitude of the pupils 
so neutral vis a vis the experiences of science 
classes that the major crystalizing agency be- 
comes the behaviors of the science teacher? 

At the least, the phenomenon of significance 
in the coefficients of all the science groups sug- 
gests interesting possibilities as to subject dif- 
ferences that might provide a fruitful fieldfor re- 
search. 

In the original statement of the problem of 
this study, the hypothesis was stated that the in- 
clusive scores are positively related to the 
amount of self-initiated work reported by pupils. 
The tabulation of rjg, Table III, shows two neg- 
ative r’s, both non- significant (r = -.10; r= 
-.03). The sign test permits rejection of the 
null hypothesis that@ = 0. The level of signifi- 
cance is beyond .01. 

The relationship between the inclusive scores 
and the self-initiated work scores appears to be 
the strongest of the antecedent-consequent rela- 
tionships. Twenty-five of the 33 coefficients 
rjg attain significance at or beyond the .05 level. 
The distribution of these r’s by schools and sub- 
jects is summarized in Table V. 

The results of the application of the sign test 
and of the t -test confirm the hypothesis that in 
the perception of the pupils, inclusive behaviors 
of the teacher are positively related to self- 
initiated work. 

In view of the findings as to the correlation of 
inclusive scores with required and self-initiated 
work scores within groups, it is of interest to 
make a similar examination for the total sample 
population of pupils and for the pupils in each 
community separately. However, if these r’s 
are computed on the group means of the independ- 
ent (I) and the dependent (R and S) variables, the 
observed coefficient of correlation may be spur- 
ious as a result of throwing together groups hav- 
ing unlike means. 13 

A solution to this difficulty may be found by 
utilizing some values computed in the course of 
the analysis of variance with covariance adjust- 


ments. Three correlation coefficients may be de- 
rived from these aata: (1) based on the total sum- 
of-squares, (2) one based on the within-sums, and 
(3) one based on the between-sums. ‘‘The between- 
groups r is actually the correlation between the 
X[1] means and the Y[ Ror S] means for the 
groups...[ and] an appreciable between-groups r 
indicates that the total r is spurious; this spurious- 
ness is eliminated when r is computed from the 
with-in sums. ’’14 

In order to avoid such spuriousness, the de- 
sired coefficients of correlation for the whole 
sample and for each community have been comput- 
ed from the within-sums. The total r’s, the be- 
tween-groups r’s, and the within-groups r’s are 
presented in Fable VI. 

The data of Table VI reveal that every correla- 
tion coefficient except the among-groups r]jR of 
Community B is significant at the .01 level. The 
single exception is significant at the .025 level. 
Although the substantial among-groups r’s def- 
initely indicate some spurious value in the total r’s, 
the within-group r’s are valid and significant. 
Thus it seems fair to conclude that the individual 
pupil’s perception of his teacher’s inclusiveness 
tends to be related to the pupil’s work scores. 
The hypotheses relating inclusive scores to re- 
quired and to self-initiated work scores within the 
framework of the pupil’s perception seem to be ten- 
able. (Since the conjunctive scores appear to be 
quite consistent with and related to inclusive 
scores, it appears likely that a similar analysis 
for conjunctive scores might furnish evidence for 
substantially similar conclusions.) The results 
of the analyses also suggest that a productive field 
for further research may be an intensive examin- 
ation of the differential effect of pupils’ percep- 
tions of the same teacher’s inclusiveness. 

The possibility of increasing the accuracy of 
prediction by using both the inclusive andconjunc- 
tive variables in a multiple regression equation 
was considered negligible. The variables proved 
to be very highly intercorrelated.15 The 33 coef- 
ae range from TIC = .44 to rjc = . 98. 
The median correlation is rjc = . 67, and all the 
r’s are significant at the .05 level. 
~ It may be said that the ‘‘Pupil Survey’’ failed 
to provide items by which pupils could differen- 
tiate between those teachers who were ‘‘conjunctive- 
but-preclusive’’ and those who were ‘‘disjunctive- 
but-inclusive’’, if indeed such differences exist at 
all. There is a further possibility that the pupils 
may perceive their teachers in such a unitary 
manner that the ‘‘halo effect’’ of the strong, over- 
riding impression makes differentiation impos si- 
ble. As defined by the ‘‘Pupil Survey’’, the per- 
ceptions of inclusiveness and conjunctivity seem 
to be highly interrelated. It is possible to sur- 
mise, therefore, that inclusive behaviors consist- 
ently accompany conjunctive behaviors and that 
the high degree of intercorrelation between the 


TABLE VI 


CORRELATION COEFFICIENTS OF INCLUSIVENESS WITH REQUIRED WORK SCORES AND SELF- 
INITIATED WORK SCORES, FOR THE TOTAL SAMPLE AND FOR EACH COMMUNITY 


Total Sample 
Total r 
Among Groups r 
Within Groups r 


Community A 
Total r 
Among Groups r 
Within Groups r 


Community B 
Total r 
Among Groups r 
Within Groups r 


TABLE VII 
SCHOOL MEANS OF ALL ANTECEDENT AND CONSEQUENT SCORES 


Antecedent Variables Consequent Variables 


COGAN 99 
a TIR P af P 
1784 . 38 <.01 1784 . 40 <.01 
31 . 63 <.01 31 . 62 <.01 
1752 . 28 <.01 1752 .35 <.01 
926 .36 <.01 926 . 38 <.01 
12 . 64 <.01 12 .63 <.01 
913 22 <.01 913 . 30 <.01 
856 .37 <.01 856 .42 <.01 
17 51 <.025 17 .59 <.01 
838 .34 <.01 838 .39 <.01 
School n I P c R s 
I 515 88. 51 45.20 108. 13 52. 23 27. 32 
ul 416 71.43 62.01 102.71 43.38 20. 56 
ul 305 70. 25 59. 03 98.89 42. 26 19. 34 
IV 274 79. 41 50. 33 108. 96 45. 98 25.35 
Vv 314 77.41 53. 91 104. 60 41.53 24. 23 


100 JOURNAL OF EXPERIMENTAL EDUCATION 


two merely reflects this phenomenon. 

Insofar as the overall findings of the preced- 
ing analysis are concerned, the inclusive and 
conjunctive scores of a teacher may have some 
validity as indices of the teacher’s ability to mo- 
tivate his pupils, if the criteria of this ability 
are statedin terms of the pupils’ perceptions 
of the amount of required and self-initiated work 
they do. 

There may be some interest attachedto a sub- 
sidiary treatment of the scores derived from the 
‘‘Pupil Survey’’. After the pre-test had been 
scored, certain rather striking regularities 
were found when the mean score of each school 
was computed for each variable. Similar compu- 
tations were made for this present research. 
These are presented in Tables VII and VIII. 

When these means are rank-ordered (the pre- 
clusive variables with negative orientation), they 
provide the data of Table VII. 

With all the limitations of such rank orders 
firmly in mind, it is still a matter of interest to 
note that the school as characterized by these 
mean scores seems to behave in a highly sta- 
ble fashion in terms of the variables of this study. 
It is as though there were a kind of pervasive 
and characteristic ‘‘climate’’ for eachschool. It 
may be remarked in passing that this climate ap- 
pears to be independent.of the socioeconomic de- 
scription of the community in which the school is 
located. It is the guess of the writer that a ma- 
jor influence upon the climate of the school is the 
educational and administrative policy of the prin- 
cipal. 

It seems fair to say, at least, that the rank- 
ordered means of schools lend support to the 
idea that there is some intrinsic meaning in the 
variables of this study. 


‘‘The Teacher as Seen by His Colleagues’’ 


The instrument bearing this title was designed 
to measure the opinion of the principal of a school 
on the inclusive, preclusive, and conjunctive be- 
haviors of his teachers. In order to be able to 
compare these scores with corresponding esti- 
mates furnished by the pupils on the Survey, the 
items in the former instrument were intentional - 
ly made up as paraphrases of items inthe Survey. 
The vocabulary of the questionnaire submitted 
to the principals is naturally much less restrict- 
ed than that to be found in the ‘‘Pupil Survey’’. 

Four principals filled out and returned the 
questionnaire for 28 teachers. These scores 
were then matched with the mean score of the 
groups for the corresponding teacher, and the 
coefficient of correlation was computed for each 
principal. Table IX presents the results of the 
computation. 

Three negative signs appear, all non-signifi- 
cant, among the 12 coefficients. Of the nine pos- 


itive r’s, three are significant. The sign test 
gives an indication of a trend, with some support- 
ing evidence from the appearance of three signifi- 
cant r’s. However, in view of the small numbers 
of cases and of the appearance of three sizable 
negative r’s, the statements that may be made 
concerning the relationships should err, if at all, 
on the side of conservatism. The perception of 
the principals as to the behaviors of their teach- 
ers cannot be said to be significantly related to 
the perceptions of the pupils. The principals of 
schools I and V show some appreciable trend 
toward agreement with the pupils, whereas the 
principal of school III tends to disagree with the 
pupils’ reports. 


‘*The Teacher’s Estimate of the Pupils’ Work’’ 


Each teacher was asked to rate each of his pu- 
pils on the amount of required and self-initiated 
work performed. Twenty-nine of the 33 teachers 
sent in usable returns. The teacher’s scores for 
each pupil were paired with the corresponding es- 
timate furnished by the pupil himself. Coeffi- 


cients of correlation were computed for the con- 
sequent variables. The results are presented in 
Table X. 

In three of 29 groups, the sign TRRp is nega- 


tive. One negative sign appears in the array of 
TStSp° All four negative coefficients are non-sig- 


nificant, and the sign test demonstrates that the 
probability of attaining such preponderances 
of positive values makes it highly unlikely that 
the population value is zero (P<.01 for each ar- 
ray). The appearance of 14 significant r’s in 29 
for required work, and 16 significant r’sin 29 for 
self-initiated work lends strong support to the 
statement that the teacher’s perceptions of the 
amount of required and self-initiated work per- 
formed by his pupils has a positive relationship 
to the pupils’ perceptions of the amounts of such 
work they do. This finding tends to corroborate 
other findings indicating that the researcher may, 
after some few precautions have been taken, place 
some reliance upon the reports of pupils as to 
their own performance of school work. 


Summary of Findings of the ‘‘Perception’’ 
Analysis 


1. The individual pupils’ ratings tend to differ 
for different teachers, in their perception 
of the teachers’ behaviors and in the amounts 
of work performed. 

2. The principals’ rating of the teachers’ be- 
haviors are not consistently related to the 
pupils’ rating of the teachers. 

3. The teachers’ estimates of their pupils re- 
quired and self-initiated work are signifi- 


TABLE VIII 


RANK ORDER OF SCHOOL MEANS OF ALL ANTECEDENT 
AND CONSEQUENT SCORES 


Consequent 
Antecedent Variables Variables 


P R 


TABLE IX 


COEFFICIENTS OF CORRELATION OF THE PRINCIPAL’S RATINGS WITH 
THE AVERAGE PUPIL-RATINGS ON THE INCLUSIVE, PRECLUSIVE, 
AND CONJUNCTIVE VARIABLES, FOR 28 TEACHERS 


No. of 
Teachers 


10 
7 


6 


Total 28 


4subscript 1 indicates principals’ ratings; subscript 2 indicates pupils’ rat- 
ings. 
*Significant at the .05 level. 


COGAN 101 
School n I 
I 515 1 1 2 1 1 
ll 416 4 5 4 3 4 
Il 305 5 4 5 4 5 
IV 274 2 2 1 2 2 
Vv 314 3 3 3 5 3 
a a a 
I .61* . 40 
Ill -.49 .01 -. 54 
IV 48 -.29 
| Pe . 36 . 83* 


102 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE X 


COEFFICIENTS OF CORRELATION OF TEACHER’S ESTIMATE OF PUPILS’ RE- 
QUIRED AND SELF-INITIATED WORK WITH PUPILS’ 
SCORES ON THESE VARIABLES 


No. of a a 
School Subject Teacher Pupils TRIRp TStSp 


I Eng. 3 48 41* 46* 
I Eng. 4 53 .27* 58* 
I Eng. 6 48 . 23 30* 
I Eng. 7 22 27 17 
I Eng. 8 55 .14 48* 
I Arith 2 80 . 23* 23* 
I Arith 5 62 .17 22* 
I Arith 10 33 26 20 
Il Eng 13 110 14 13 
Il Eng 14 93 09 15 
0 Arith ll 88 05 17 
0 Arith 12 125 15* 00 
Ill Eng 16 24 .22 .21 
Ill Eng. 19 71 32* . 40* 
Ill Arith 15 13 -.22 ll 
Ill Arith 17 75 23* .10 
Il Sci. 18 112 -.01 . 68* 
IV Eng. 21 20 -1l 42* 
IV Eng. 22 18 43* 57* 
IV Eng 26 41 -.16 00 
IV Eng. 27 21 52* .21 
IV Arith. 20 21 . 40* . 38* 
IV Arith 24 53 25* 22 
IV Sci 25 98 16 48* 
Vv Eng. 28 18 . 60* . 42* 
Vv Eng. 31 14 .71* . 64* 
Vv Eng. 23 42 -.05 -.02 
Vv Arith. 32 69 . 56* . 34* 
Vv Sci. 30 53 .27* . 35* 


Number of significant r’s 14 16 


4 Subscripts t and p indicate teachers’ and pupils’ estimates, respectively. 
* Significant at .05 level. 


id 
(ja 
ae 
a 
in 
4 


COGAN 


cantly related to the pupils’ own estimate 
of their work. 

. The relationship of the preclusive behav- 
iors to the work scores is not clear, the 
evidence being inconclusive. . 

Strong evidence is adduced to show that in 
the perception of the pupils, scores on in- 
clusive and conjunctive behavior of teach- 
ers are related to scores on the perform- 
ance of required and self-initiated work of 
pupils. 


Some Implications of the Research 


The results observed in this study may be of 
interest to researchers in education, to persons 
involved in classroom practice, and to psycholo- 
gists. 

For educational researchers there may be 
some value in the stability and discriminative 
power of the pupils’ scores on items of teacher 
behavior. These items seem tohave the virtues 
of specificity and clear definition. They may 
prove useful in studies of classroom competence. 
It is also perhaps worth noting that the concept 
of amounts of pupil work as variables intervening 
closely before pupil change may be productive, 
since there is some basis for confidence in the 
pupils’ reports of their own work. 

There is some possibility that the findings of 
this study may have at least a limited applicabil- 
ity in the classroom, since most educators are 
patently interested in ways of increasing pupil 
productivity. Furthermore, the strong relation- 
ship between the inclusive behavior of teachers 
and the self-initiated work of their pupils must 
be of some moment to educators who place great 
reliance upon theories of education in which the 
pupil’s interest, his self-reliance, his creativity, 
and his self-initiated activities play so important 
a role. 

And finally, at least two additional elements 
of this research may hold some interest for psy- 
chologists. The first is that some indirect evi- 
dence is found that would tendto reinforce Miller 
and Dollard’s hypotheses relating to the gradi- 
ents of approach and avoidance. The second is 
that the research may offer some clues for the 
psychologist concerned with the relationships of 
basic personality variables tothe overt class- 
room behaviors of teachers. 


FOOTNOTES 


* This article is based on a doctorate thesis 
prepared at the Harvard Graduate School 
of Education. A detailed presentation of 
the theoretical structure upon which this 
present analysis is based is available in an 


103 


article entitled ‘‘Theory and Design of a 
Study of Teacher-Pupil Interaction, ’’ ap - 
pearing in Harvard Educaiional Review, 
XXVI (Fall 1956), pp. 315-42. 


1. N. E. Miller and J. Dollard. Social Learn- 
ing and Imitation (New Haven: Yale Univer- 
sity Press, 1941). 


. H. A. Murray. Explorations in Personality 
(New York: Oxford University Press, 1941). 


Murray, op. cit. 


K. Lewin and others. ‘‘Patterns of Aggres- 
sive Behavior in Experimentally C re ated 
‘Social Climates’, ’’ Journal of Social Ps y- 
chology, X (May 1939). 


. H. H. Anderson and others. ‘“‘Studies of 
Teachers’ Classroom Personalities, III,’’ 


Applied Psychology Monographs, No. 11 
Ganford: Sanford University Press, 1946). 
R. B. Cattell. Description and Measure- 


ment of Personality (New York: World 
Book Co., 1946). 


. The description of the methods and findings 
of the trait analysis is presented in the 
second article, entitled ‘‘The Relation of 
the Behavior of Teachers to the Produc- 
tive Behavior of Their Pupils: Il. ‘Trait 
Analysis’,’’ in this Journal. 


. These data have been plotted and are present- 
ed in graphic form in Chapter V of the or- 
iginal doctorate dissertation at the Har- 
vard Graduate School of Education. 


P. O. Johnson. Statistical Methods in Re- 
search (New York: Prentice-Hall, 1949), 
pp. 57-8. 


. E. F. ‘Lindquist. Statistical Analysis in Ed- 
ucational Research (Boston: Houghton Mif- 
flin Co., 1940), pp. 210-11. 


These tables may be found in Appendix D of 
the original thesis, pp. 171-74. 


As presented by Q. McNemar, Psychological 
Statistics (New York: John Wiley and Sons, 
1949), p. 96, formula 30a. 


. McNemar, op. cit., p. 323. 
. Ibid., p. 322. 


For all the intercorrelations not reported in 
this article, tables are available in Appen- 


6. 
‘ 
14 


JOURNAL OF EXPERIMENTAL EDUCATION 


dix D, pp. 171-74, of the author’s original 
dissertation. 


BIBLIOGRAPHY 


Allport, Gordon W. The Nature of Personality: 
Selected Papers (Cambridge, Mass: Addi- 
son-Wesley Press, 1950). 

Allport, Gordon W. Personality (New York: 
Henry Holt and Co., a 

Allport, Gordon W. ‘‘Psychology of Participa- 


tion,’’ Psychological Review, LII (May 1945), 
pp. 117-39. 


Anderson, Harold H. ‘‘Domination and Social In- 
tegration in the Behavior of Kindergarten 
Children in an Experimental P1ay Situation,’ 
Journal of Experimental Education, VIII (De- 
cember 1939), pp. 123-31. 

Anderson, Harold H. ‘‘Domination and Socially 
Integrative Behavior, ’’ in Child Behavior and 
Development (New York: McGraw-Hill Book 
Co., 1343). pp. 459-83. 

Anderson, Harold H., and Brewer, Joseph E., 
‘*Studies of Teachers’ Classroom Personali- 
ties, Il,’’ Applied Psychology Monographs, 
No. 8 (Stanford, Cali ornia: igs 
sity Press, 1946). 

Anderson, Harold H., and others. ‘‘Studies of 
Teachers’ Classroom Personalities, Ill,”’ 
Applied Psychology Monographs, No. 11 (Stan- 
ford, California: University Press, 
1946). 

Barr, Arvil S. ‘‘The Measurement and Predic- 
tion of Teacher Efficiency,’’ Review of Edu- 
cational Research, XVI (June 1946), pp. 203- 
208. 

Barr, Arvil S. ‘‘The Measurement and Pr edic- 
tion of Teaching Efficiency: A Summary of In- 


vestigations, ’’ Journal of Experimental Edu- 
cation, XVI (June 1948), pp. 503- 83. 


Bryan, R. C. ‘‘Why Student Reactions to Teach- 
ers Should be Evaluated, ’’ Educational Admin- 


istration and ctober 
» Pp- 

Bush, Robert N. ‘‘A Study of Student-Teacher 
Relationships, ’’ Journal of Educational Re- 
search, XXXV (May 1942), pp. 645-56. 

Cattell, Raymond B. Description and Measure- 
ment of Personality (New York: World Book 
Co., 1946). 

Cook, Walter W., and Leeds, Carroll H. ‘‘Meas- 
uring the Teaching Personality, ’’ Education- 


al and Psychological Measurement, VII (Au- 
tumn 1947), pp. 399-410. 

Dollard, John and Miller, Neal E. Per sonany 
and Psychother (New York: McGraw-Hi 


0. 
Domas, ‘Simeon J. Report of an Exploratory 
' Study of Teacher Competence (Cambridge, 


Mass.: New England School Development 
Council, 20 Oxford St., 1950). 

Jersild, Arthur T., and others. ‘‘An Evaluation 
of Aspects of the Activity Program in the 
New York City Public Elementary Schools, ’”’ 


Journal of Experimental Education, VIII (De- 
cember 1939), pp. 166-207. 

Jersild, Arthur T., andothers. “‘A Further 
Comparison of Pupils in ‘Activity’ and ‘Non- 
Activity’ Schools, ’’ Journal of Experimental 
Education, IX (June 1941), pp 303- 9. 

Johnson, Palmer O., Statistical Methods in Re- 
search (New York: Prentice- . 

Jones, Ronald DeVall. ‘The Prediction of Teach- 
ing Efficiency from Objective Measures,’”’ 
Journal of Experimental Education, XV (Sep- 
tember 1946), pp. 85-100. 

La Duke, C. V. ‘‘The Measurement of Teaching 
Ability, ’’ Journal of Experimental Education, 
XIV (September 1945), pp. 75-100. 

Leeds, Carroll H. ‘‘A Scale for Measuring 
Teacher-Pupil Attitudes and Teacher-P upil 
Rapport,’’ Psychological Monographs, LXIV 
No. 6 (1950), pp. 1-24. 

Lewin, Kurt. ‘‘Psychology and the Process of 
Group Living,’’ Journal of Social Psychology, 
XVII (February 1943), pp. 113-31.. 

Lewin, Kurt, and others. ‘‘Patterns of Aggres- 
sive Behavior in Experimentally Created 


‘Social eyed ” Journal of Social Psy- 
chol X (May 1939), pp. 271-99. 

Lindquist E. F. Statistical Analysis in Educa- 
tional Research (Boston: Houghton Mifflin 
Co. , 1940). 

Lins, Leo J. ‘‘Prediction of Teaching Efficiency,’ 
Journal of Experimental Education, XV (Sep- 
tember 1946), pp. 2-60 

Lippitt, Ronald. ‘‘An Experimental Study of the 
Effect of Democratic and AuthoritarianGroup 


Atmospheres,’’ University of Iowa Studies in 
Child Welfare, XVI SELiOn pp. 45-195. 

Lippitt, Ronald, and White, Ralph K. ‘‘The ‘So- 
cial Climate’ of Children’s Groups,’’ inChild 
Behavior and Development (New York: Mc- — 

raw-Hill Bo “ » pp. 485-508. 

McCall, William A. Measurement of Teacher 


Merit (Raleigh, N.C.: State Superintendent of 
Public Instruction, 1952). 


McNemar, Quinn. Psychological Statistics (New 
York: John Wiley and Sons, 1949). 

Miller, Neal E., and Dollard, John. Social Learn- 
ing and Imitation (New Haven: Institute of Hu- 
man Relations, Yale University Press, 1941). 

Murray, Henry A. Explorations in Personality 
(New York: Oxford University Press, 1938). 

Orleans, Jacob S., and others. ‘‘Some Prelimin- 
ary Thoughts on the Criteria of Teacher Ef- 
fectiveness,’’ Journal of Educational Re- 
search, XLV (May 1952), pp. 641-48. 

Raths, L. E. ‘‘Dangers of Appraising Teaching 


105 


COGAN 


Efficiency, ’’ School Executive, LXVII (April 
1948), pp. 55-56. 

Rolfe, J. F. ‘‘The Measurement of Teaching 
Ability: Study Number Two, Journal of Ex- 
perimental Education, XIV (September 1945), 
pp. 52-74, 

Rostker, Leon E. ‘‘The Measurement of Teach- 
ing Ability: Study Number One, ’’ Journal of 
Experimental Education, XIV (September 
1945), pp. 6-51. 

Rostker, Leon E. ‘‘A Method for Determining 
Criteria of Teaching Ability in Terms of 


Measurable Pupil Changes,’’ Educational 
Administration and Supervision, XXVIII (Jan- 


uary 1942), pp. 1-19. 
Ryans, David G., and Wanat, Edwin. 


Factor 


Analysis of Observed Teacher Behaviors in 
the Secondary School: A Study of C riterion 
Data,’’ Educational and Psychological Meas- 
urement, XII (Winter 1383) pp. 
Withall, John. ‘‘The Development of a Technique 
for the Measurement of Social-Emotional Cli- 
mate inClassrooms,’ Journal of Experimental 
Education, XVII (March 1949), pp. 347-361. 


i 


JOURNAL OF EXPERIMENTAL EDUC ATION 
(Volume 27, December 1958) 


THE BEHAVIOR OF TEACHERS AND THE PRODUCTIVE 
BEHAVIOR OF THEIR PUPILS: 
IL “TRAIT” ANALYSIS* 


MORRIS L. COGAN 
Harvard University 


ATTEMPTS TO appraise the competence of 
teachers have commonly utilized one of two cri- 
terion measures. The first of these is frequent- 
ly taken in terms of the opinions of principals, 
supervisors, and experts as tothe effectiveness 
of the teacher. Thesecondis, commonly, some 
estimate of pupil change, or growth, or achieve- 
ment. Both paradigms seem to have serious 
limitations in their applicability to research pur- 
poses. The opinions of persons considered com- 
petent to make judgments of the effectiveness of 
ateacher achieve acceptable reliability only 
with great difficulty; they are also, almost trans- 
parently, too far removed from the logically de- 
fensible criterion of pupil change. On the other 
hand, the attempt to measure pupil change also 
encounters certain stubborn problems, i.e., 
tests are lacking to measure many of the most 
important of pupil changes; it is, in addition, ex- 
tremely difficult to isolate the influences of spe- 
cific teachers upon pupils inprograms of depart- 
mentalized instruction. 

The present study seeks an answer to some 
of these methodological problems. The independ- 
ent measures are taken in terms of specific, ob- 
servable teacher behaviors, avoiding recourse 
to expert opinion; the dependent measures are 
taken in terms of two measures of pupil work, 
which are considered to intervene just prior to 
pupil change and which can be more satisfactor- 
ily measured than pupil change. 

The independent variables of teacher behav- 
ior are called inclusive, preclusive, and conjunc- 
tive. The inclusive behaviors tend to make the 
pupils central to the teacher’s classroom deci- 
sions and to the teaching-learning experience; 
the pupils feel that their goals, their abilities, 
their needs are taken into important account; 
the teachers exhibit behaviors that may be 


termed integrative, affiliative, nurturant. Preclu- 
sive behaviors of teachers tendto make the pupils 
peripheral to classroom decisions and experiences; 
the pupils feel that their needs, their goals, their 
abilities are frequently overridden by other con- 
siderations; the teachers exhibit behaviors that 
may be termed dominative, aggressive, rejectant. 
Conjunctive behaviors are those behaviors of the 
teacher which give evidence of (1) his skill in 
classroom management, (2) his ability to com- 
municate with the pupils, (3) his commandof and 
ingenuity in dealing with the subject matter, and 
(4) the level of his demands upon the pupils. 

The dependent variables are two measures of 
the pupils’ productivity: (1) the amount of re- 
quired work performed by the pupils, and (2) the 
amount of class-related self-initiated work per- 
formed by the pupils. 

The study relies mainly upon data collected 
from the pupils themselves by means of a ques- 
tionnaire called the ‘‘Pupil Survey’’. Scores for 
each of the three teacher variables are derived 
from pupil ratings of the frequency with which a 
specified teacher performs certain actions. The 
productivity scores are secured by providing 
checklists to the pupils on which they report the 
frequency with which they perform certain com- 
mon required assignments and engage in various 
self-initiated activities in connection with the 
work in a specified classroom. Each pupil is 
asked to report on two of three teachers (in Eng- 
lish, arithmetic, science). 

Although an effort is made to secure cross- 
checks on the teachers’ behaviors by having each 
principal rate his teachers, and on the pupils’ 
work scores by having each teacher rate each pu- 
pil for the two kinds of work, the major depend- 
ence of this research is upon the pupils’ reports, 
since they are in an excellent position to observe 


*This is the third article in a series of three. The first is entitled ‘‘Theory and Design of a Study of 
Teacher-Pupil Interaction, ’’ in Harvard Educational Review, XXVI (Fall 1956), pp. 315-42; the sec- 


ond is entitled ‘‘The Behavior of Teachers and the Productive Behavior of Their Pupils: I. ‘Percep- 
tion’ Analysis, ’’ which is published in this issue of the Journal of Experimental Education. 


108 JOURNAL OF EXPERIMENTAL EDUCATION 


their own work and the behavior of their teachers. 

Some representative hypotheses state that the 
teachers’ inclusiveness (I) scores will be posi- 
tively related to the required work (R) scores 
and to the self-initiated work (S) scores. A sim- 
ilar hypothesis is offered for the conjunctive (C) 
variable. The preclusive (P) behaviors of the 
teachers are hypothesized to be negatively relat- 
ed to the required and self-initiated work scores 
of the pupils. 

The sample includes 33 teachers, five princi- 
pals, and 987 eighth-grade pupils in five depart- 
mentalized junior high schools. From these pu- 
pils 1786 usable surveys were obtained. The 
schools are located in two urban comm unities 
(‘‘A’’ and ‘‘B’’), having appreciably different 
socio-economic characteristics. The population 
of Community A has a rather high average in- 
come and educational level. A relatively large 
proportion of the population is employed in pro- 
fessional or managerial positions. The persons 
living in Community B earn appreciably less on 
the average and have completed fewer years of 
formal schooling. They hold relatively fewer 
professional and managerial positions; more of 
them are employed in skilled, semi-skilled, and 
clerical capacities. 


Analysis of the Data 


This paper is concerned with the ‘‘trait’’ an- 
alysis of the data. The central emphasis is up- 
on the observed differences among teachers, i.e., 
the characterization of the teachers in terms of 
the averages of the ratings. The averages de- 
rived from the pupils’ reports are taken to be the 
measures of the traits of the teacher, and the 
statistical treatment of the data is designed to 
ascertain the relationships between these aver- 
aged scores and the averaged pupil productivity. 
The techniques employed include the computation 
of the among-groups r’s of the variables, and the 
F-ratios derived from analysis of variance. In 
these statistical procedures teachers are not 
treated individually, they are considered coll ec- 
tively as a sample of teachers. The basic data 
are the average scores of a group, ‘‘group’’ be- 
ing defined as all the pupils in the sample who 
have reported on their work with a specified 
teacher. Most of the pupils were able to com- 
plete a survey for two of their teachers. 


Analysis of Variance 


At the outset, it is useful to determine the an- 
swers to certain questions concerning the five 
variables: 


1. Do the means of the dependent and inde- 
pendent variables taken by groups Serve to differ - 
entiate among teachers, or is the group not a 


meaningful classification? 

2. Are there school differences in the scores 
on the variables, or may the schools be considered 
as random samples from a single population of 
schools? 

3. Does the community constitute a meaning- 
ful unit of analysis, or may the communities be 
considered as random samples of com munities 
from a single population? 

4. Do the means of the criterion variables 
serve to differentiate among teachers of the same 
subject? 


These questions and others of related character 
are examined by means of variance analysis. 
Since the numbers of cases are not equal in differ- 
ent categories of classification, only the simplest 
analysis of variance model is used. Certain im- 
portant questions are left to be answered later by 
the covariance adjustment. 

For each of the following computations the num- 
ber ofdegrees of freedom for the sum-of-squares 
among the units of the classification is one less 
than the number of units. For the sum-of-squares 
within units of the classification, the number of 
degrees of freedom is equal to the number of pu- 
pils minus the number of units of classification. 

The first step in the variance analysisis to ex- 
amine the hypotheses that the means of the scores 
of the groups are equal for each of the variables. 
The F-test is applied to the scores of each group, 
and the null hypothesis is: 


Ho: Hi = M2 =° * * = 


where uw is the mean score of a group on a given 
variable for the teacher indicated by correspond- 
ing subscript. The results of the computations 
are presented in Table I through Table V. 

It is to be noted that there was a large and of- 
ten complete overlap of membership in the 33 
groups. No allowance could be made for this in 
the computation of the F-ratios over groups. The 
significance obtained would be an under-estimation 
if pupils scoring high on their own work oron 
their teacher’s behaviors in one subject also tend- 
ed to score high on these variables in another 
subject. (It may be recalled that most pupils 
completed two ‘‘Surveys’’.) If, on the other hand, 
the homogeneity thus hypothesized did not occur, 
and high scores in one subject were accompanied 
by low scores in the other, asif by some “‘law of 
compensation’’, then the level of significance 
would be an over-estimation. The occurrence of 
this latter phenomenon is considered highly un- 
likely, especially in view of the results of further 
analysis of the differences among the group means 
within each subject area, where the groups are 
non-overlapping. 

The hypothesis that the means of the scores 


TABLE I 


ANALYSIS OF VARIANCE OF GROUP MEANS OF INCLUSIVE SCORES 


Source of Sum of 


Variation Squares Variance 


Among Groups 198,105. 28 6,190.79 


Within Groups 410, 920.13 234. 41 


Total 609, 031. 41 


‘TABLE II 


ANALYSIS OF VARIANCE OF GROUP MEANS OF PRECLUSIVE SCORES 


Source of Sum of 
Variation Squares Variance 


Among Groups 208, 712.98 6, 522. 28 
Within Groups 500, 569. 38 285. 55 


Total 709, 282. 36 


TABLE II 


ANALYSIS OF VARIANCE OF GROUP MEANS OF CONJUNC TIVE SCORES 


Source of Sum of 


Variation Squares Variance 


Among Groups 114,846.48 - 3, 588.95 
Within Groups 397, 231.90 226. 60 


Total 512, 078. 38 


COGAN 109 
F P 
26. 41 <.01 
F P 


110 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE IV 
ANALYSIS OF VARIANCE OF GROUP MEANS OF REQUIRED WORK 


Source of Sum of 
Variation df Squares Variance F P 
Among Groups 32 141, 363. 92 4, 417. 62 17.76 <.01 
Within Groups 1753 435, 887.49 248. 65 
Total 1785 577, 251. 41 
TABLE V 


ANALYSIS OF VARIANCE OF GROUP MEANS OF SELF-INITIATED WORK 


Source of Sum of 
Variation df Squares Variance F P 
Among Groups 32 68, 535. 86 2,141.75 9.40 <.01 
Within Groups 1753 399, 470. 86 227. 88 
Tutal 1785 468, 006. 72 
TABLE VI 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF INCLUSIVE SCORES 


Source of Sum of 

Variation af Squares Variance F P 
Among Schools 4 93, 982. 66 23, 495. 66 81.25 <.01 
Within Schools 1781 515, 048. 77 289.19 


_ Total 1785 609, 031. 43 


COGAN 111 


on each of the five variablesfor all groups are 
equal is rejected. The scores derived from the 
surveys discriminate among teachers. This is 
tantamount to saying that the pupils report differ- 
ing levels of the three kinds of teacher behavior 
and of required and self-initiated work for differ- 
ent teachers. 

The next phase of the analysis is an examina- 
tion of the hypotheses that there are no dif- 
ferences among the school means of pupils on 
the variables. The results of this analysis are 
summarized in Tables VI through X. 

All the F-ratios are significant, and the null 
hypotheses regarding school means are rejected 
(P<.01). In terms of the simple analysis of var- 
iance, the schools vary significantly, and they 
cannot be considered as random samples from 
the same population of schools. 

(Perhaps an anticipatory note is indicated at 
this point. Untii an analysis of the hierarchical 
relations existing among the three levels of clas- 
sification—teachers, schools, and communities 
—is made, it should be kept in mind that signifi- 
cant differences observed at any one level may 
merely be reflections of differences found at an- 
other level.) 

The question as to whether significant differ- 
ences exist between the means of the variables 
in the two communities is examined next. With 
the possibility in mind that the socioeconomic 
level of the community might prove tobea signif- 
icant factor in the measurement of the teacher 
variables and pupils’ productivity, an effort was 
made to maximize socioeconomic differences in 
the selection of communities. The results of the 
analysis of inter-community differences are sum- 
marized in Table XI through Table XV. 

The communities vary significantly at the .01 
level in scores on the inclusive and required 
work variables. The differences between com- 
munities on the preclusive and conjunctive vari- 
ables are significant at the .05 level; the differ- 
ences are no longer significant on the self-initi- 
ated variable. Since the .01 level was accepted 
as significant prior to the analysis, it may be 
seen that acceptable significant differences 
among communities are limited to two variables. 
At the community level in the hierarchy of clas- 
sifications of this study, there is evidence that 
some distinctions, apparently quite sharp at the 
teacher and school level, are becoming less acute. 
Rather strong evidence presented in alater anal- 
ysis tends to corroborate this idea. 

No conclusions are drawn at present, there- 
fore, as to the relationship between the socio- 
economic factor and the community scores on 
the variables. It is interesting to note that 
throughout the analyses thus far performed the 
magnitude of the F-ratio has consistently been 
smaller for self-initiated work than for required 
work, although the ratio becomes non-significant 


only at the community level. There would seem 
to be some evidence to indicate that the amount of 
self-initiated work varies less within the units of 
classification than does required work. Of the in- 
dependent variables the sharpest variances are to 
be found in the inclusive scores, descending 
through the preclusive and conjunctive scores in 
order. 

Finally, the analysis of variance is computed 
for each subject area. The question to be an- 
swered is whether there still exist differences 
among the several group means of the variables 
within a given sub-class of teachers, i.e., those 
instructing in English, arithmetic, or science. 

Two special conditions applying to these sub- 
classes deserve mention. First, the subject clas- 
sifications cut across all schools andcommunities, 
with the exception of science (which is limited to 
three schools within a single community). The 
subjects do not, therefore, constitute a fourth 
level of the three-stage hierarchy already de- 
scribed. Second, since the memberships of the 
groups within each subject class do not overlap, 
the qualifications as to over-lapping member - 
ships that were appended to the analysis for all 
groups do not apply here. There is, then, no 
reason for taking into account the possible effects 
of the shared memberships of groups upon the sig- 
nificance of observed differences in group means 
within each subject area. Table XVI through Ta- 
ble XX summarize the results of the variance an- 
alysis. 

All the F-ratios for group means of the scores 
within each subject area are significant at the .01 
level. 

In terms of the responses of pupils on the ‘‘Pu- 
pil Survey’’, these conclusions seem warranted: 


The simple analysis of variance without co- 
variance adjustment and without reference 
to the impact of levels of the hierarchy up- 
on each other shows significant differences 
in the means of all the variables for all 
groups together, for all groups ineach sub- 
ject area, and for all schools. 


. In the analysis of variance between commu- 
nities, only the inclusive and required work 
scores achieve significance at the .01 
level. 


The teachers appear to be significantly dif- 

ferent with respect to traits of inclusive- 
ness, preclusiveness, and conjunctivity 
when characterization of the teachers is in 

terms of the pupils’ reports. 


The pupils report differing amounts of re- 
quired and self-initiated work for different 
teachers. 


112 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VII 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF PRECLUSIVE SCORES 


Source of Sum of . 
Variation df Squares Variance F P 
Among Schools 4 78, 307. 53 19, 576. 88 55. 26 <.01 
Within Schools 1781 630, 974. 82 354. 28 
Total 1785 709, 282. 36 
TABLE VIII 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF CONJUNCTIVE SCORES 


Source of Sum of 
Variation df Squares Variance F P 
Among Schools 4 | 23, 153. 76 5, 788. 44 21.09 <.01 
Within Schools 1781 488, 924. 62 274. 52 
Total 1785 512, 078. 38 
TABLE Ix 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF REQUIRED WORK 


Source of Sum of 

Variation af Squares ‘° Variance F P 
Among Schools 31, 877.13 7,969.28 26. 02 
Within Schools 1781 545, 372. 81 306. 22 


Total 1785 577, 249. 94 


TABLE X 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF SELF-INITIATED WORK 


Source of Sum of 
Variation Squares Variance 


Among Schools 17, 803. 43 4,450. 86 
Within Schools 450, 140. 29 252.75 


Total 467, 943.72 


TABLE XI 


ANALYSIS OF VARIANCE OF COMMUNITY MEANS OF INCLUSIVE SCORES 


Source of Sum of 
Variation Squares Variance 


Between 
Communities 13, 480. 86 13, 480. 86 


Within 
Communities 595, 549. 57 333. 83 


Total 609, 031. 43 


TABLE XII 


ANALYSIS OF VARIANCE OF COMMUNITY MEANS OF PRECLUSIVE SCORES 


Source of Sum of 
Variation Squares Variance 


Between 
Communities j 1, 642. 11 1.642.11 


Within 
Communities 707, 640. 25 396. 66 


Total 709, 282. 36 


COGAN 113 
F 
F 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VII 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF PRECLUSIVE SCORES 


Source of Sum of 


Variation Squares Variance 


Among Schools 78, 307. 53 19, 576. 88 
Within Schools 630, 974. 83 354. 28 


Total 709, 282. 36 


TABLE VIII 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF CONJUNC TIVE SCORES 


Source of Sum of 
Variation Squares Variance 


Among Schools 23,153.76 5, 788. 44 
Within Schools 488, 924. 62 274. 52 
Total 512, 078. 38 


TABLE Ix 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF REQUIRED WORK 


Source of Sum of 


Variation Squares Variance 


Among Schools 31, 877.13 7,969. 28 


Within Schools 545, 372. 81 306. 22 


Total 577, 249. 94 


112 
F P 
F P 
26.02 <.01 


TABLE X 


ANALYSIS OF VARIANCE OF SCHOOL MEANS OF SELF-INITIATED WORK 


Source of Sum of 
Variation Squares Variance 


Among Schools 17, 803. 43 4, 450. 86 
Within Schools 450, 140. 29 252.75 


Total 467, 943. 72 


TABLE XI 


ANALYSIS OF VARIANCE OF COMMUNITY MEANS OF INCLUSIVE SCORES 


Source of Sum of 
Variation Squares Variance 


Between 
Communities 13, 480. 86 13, 480. 86 


Within 
Communities 595, 549. 57 333.83 


Total 609, 031. 43 


TABLE XII 


ANALYSIS OF VARIANCE OF COMMUNITY MEANS OF PRECLUSIVE SCORES 


Source of Sum of 
Variation Squares Variance 


Between 
Communities 1, 642. 11 1.642.11 


Within 
Communities 707, 640. 25 396. 66 


Total 709, 282. 36 


COGAN 113 
| 
p 
| 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XI 


ANALYSIS OF VARIANCE OF COMMUNITY MEANS OF CONJUNCTIVE SCORES 


Source of Sum of 
Variation Squares Variance 


Between 
Communities 1, 379. 46 1, 379. 46 


Within 
Communities 510, 698. 92 286. 27 


Total 512, 078. 38 


TABLE XIV 


ANALYSIS OF VARIANCE OF COMMUNITY MEANS OF REQUIRED WORK 


Source of Sum of 
Variation Squares Variance 


Between 
Communities 10, 879. 76 10, 879.76 


Within 
Communities 566, 370. 18 317.47 


Total 577, 249. 94 


TABLE XV 
ANALYSIS OF VARIANCE OF COMMUNITY MEANS OF SELF-INITIATED WORK 


Source of Sum of 
Variation Squares Variance 


Between 
Communities 659. 88 659. 88 


Within 
Communities 467, 283. 84 261.93 


Total 467, 943.72 


114 
F 
34.21 <.01 


| COGAN 115 
TABLE XVI 
ANALYSIS OF VARIANCE OF GROUP MEANS OF INCLUSIVE SCORES 
WITHIN EACH SUBJECT AREA 
Source of Sum of 
Variation df Squares Variance F P 
English 
Among Groups 17 91, 438.51 5, 387.74 9.78 <.01 
Within Groups 747 410, 926.13 550. 10 
Total 764 502, 364. 64 
Arithmetic 
Among Groups 10 79,888.56 . 7, 988. 86 36. 33 <.01 | 
Within Groups 695 152, 823.18 219.89 
Total 705 232,711.74 
Science 
Among Groups 3 13, 394.05 4, 464. 68 16.19 <.01 
Within Groups 311 85, 781.14 - 275. 82 
Total 314 99,175.19 
TABLE XVII 
ANALYSIS OF VARIANCE OF GROUP MEANS OF PRECLUSIVE SCORES 
WITHIN EACH SUBJECT AREA 
Source of Sum of 
Variation df Squares Variance F P 
English 
Among Groups 17 86, 345. 88 5,079.17 22.65 <.01 
Within Groups 747 167, 480. 66 224. 20 
Total 7164 253, 826. 54 
Arithmetic 
Among Groups 10 88, 120.85 8, 812.08 26. 68 <.01 
Within Groups 695 229, 551.74 330. 29 ' 
Total 705 307, 672.59 
Science 
Among Groups 3 26, 850. 82 8, 950. 27 26. 88 <.01 
Within Groups 311 103, 536. 98 332. 92 
Total 314 130, 387. 80 
\ 
‘ 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XVII 


ANALYSIS OF VARIANCE OF GROUP MEANS OF CONJUNC TIVE SCORES 
WITHIN EACH SUBJECT AREA 


Source of Sum of 
Variation Squares Variance 


English 
Among Groups 72, 989. 4,293.49 
Within Groups 169, 287. 226. 62 
Total 242, 277. 


Arithmetic 
Among Groups 25, 050. 2,505. 06 
Within Groups 140, 972. 202. 84 
Total 166, 023. 


Science 
Among Groups 13, 347. 4, 449.14 
Within Groups 86, 971. 279.65 
Total 100, 318. 


TABLE XIx 


ANALYSIS OF VARIANCE OF GROUP MEANS OF REQUIRED WORK 
WITHIN EACH SUBJECT AREA 


Source of Sum of 
Variation Squares Variance 


English 
Among Groups 45, 278. 2,663.45 
Within Groups 215, 921. 289.05 
Total 261, 199. 


Arithmetic 
Among Groups 52, 252. 5, 225. 27 
Within Groups 116, 623. 167.80 
Total 168, 875. 


Science 
Among Groups 14, 386. 4,795.46 
Within Groups 103, 343. 332. 29 
Total 117, 729. 


116 


TABLE XX 


ANALYSIS OF VARIANCE OF GROUP MEANS OF SELF-INITIATED WORK 
WITHIN EACH SUBJECT AREA 


Source of Sum of 
Variation Squares Variance 


English 
Among Groups 28, 382. 69 1, 669. 57 
Within Groups 198, 283.12 265. 44 
Total 226, 665. 81 


Arithmetic 
Among Groups 12, 371. 1,237.16 
Within Groups 108, 132. 155. 59 
Total 120, 504. 


Science 
Among Groups 3, 937. 1,312.59 
Within Groups 92,992. 299.01 
Total 96, 930. 


TABLE XxXI 


VALUES OF F FOR REQUIRED AND SELF-INITIATED WORK 


Teachers in Teachers in 
All Teachers Community A Community B 


R R 


Unadjusted F Value* 29.01 17.18 


Adjusted F Value* ‘ 20.58 11.65 


*All significant at .01 level. 


COGAN 117 
. 
R 
3.79 
q 


118 JOURNAL OF EXPERIMENTAL EDUCATION 


Covariance Adjustment 


As will be seen in a later subsection of this 
article, there is a positive correlation between 
each of the criterion variables and the inclusive 
variable. The simple analysis of variance does 
not make statistical allowance for the fact that 
the groups differ on required and self -initiated 
work scores and also on inclusive scores and that 
there is a correlation between these variables. 
It becomes a matter of interest, therefore, to de- 
termine whether the criterion scores differ sig- 
nificantly over and above their variation related 
to variation in inclusive scores. The covariance 
adjustment makes statistical allowance for- such 
uncontrolled differences. 

When the covariance adjustment is made for 
required and self-initiated group scores, with in- 
clusiveness controlled in each instance, F is sig- 
nificant for both criterion variables for all teach- 
ers in the sample and for all teachers in each 
community. It may, therefore, be said that the 
groups differ on the amounts required and self- 
initiated work they report, over and above what 
would be expected because of the influence of the 
teachers’ inclusiveness. For purposes of com- 
parison, the adjusted and unadjusted F’s are pre- 
sented in Table XXI. a 

The following observations may be made con- 
cerning the data in Table XXI. The results are 
not unanticipated. The theory of this research 
does not hypothesize that differences in the group 
means on the criterion variables are determined 
solely or even in their greatest part by the 
group’s perceptions of the teacher’s inclusive- 
ness. It seems fairly evident that in using intact 
groups of pupils without experimental pairing of 
productivity scores, the groups means of work 
scores would tend to be appreciably different be- 
cause of some homogeneous grouping into class- 
es, on the basis of previous record of achieve- 
ment or other informal, administrative criteria. 

It may also be worthy of note that the F-ratios 
of the unadjusted variances are consistently high- 
er than the adjusted F values, although all the 
values are significant. This fact coupled with 
the fact that the adjusted r’s computed from with- 
in-groups scores for rjR and ryjg are significant 
may indicate that some portion of the differences 
in the amounts of work reported by groups may 
be attributed to differences in the groups’ per- 
ceptions of the teachers’ inclusiveness. 


Analysis of Variance for Hierarchical 
Classifications 


In the variance analysis of these data, a prob- 


available in other publications. 


*This new application of analysis of variance was developed in connection with this paper and is not yet 


lem arises as to the treatment of observations 
belonging to teachers who belong to schools which 
in turn belong to a community. Professor John 
B. Carroll of the Harvard Graduate School of 
Education has suggested what may be anew de- 
parture in the treatment of such problems. His 
description of the statistical method follows:* 


‘There is a possibility that a new application 
of analysis of variance may be of some utility in 
dealing with hierarchical classifications. Suppose 
one has N observations which belong to Classes 
A,, Az ..., Aa which in turn belong to classes 
B,, Bz, .--, Bp (b<a), which in turn belong to 
classes C,, C2..., Ce, (c<b), etc. (It would 
be preferable that these inequalities should be 
b< 1/2a, c< 1/2b, etc., in order to insure that 
each class have at least two sub-classes.) The 
number of observations in any class may be any 
number equal to or greater than two, and the class- 
es may be of unequal size. We shall speak of the 
various classifications A, B, C, ..., M as differ- 
ent levels of classifications. 

The following specification equation can be es- 
tablished for the i observation: 


(1) RO = Oki - Ra) + 
‘Mi ~ 


where X; is the grand mean for all N observa- 
tions, and Xi, Xpi,---,XMi are the means of 
the A-, B-,...,M- classes in which the ith obser- 
vation is found. 

If we square and sum both sides of equation (1) 
over all observations, all the cross-product terms 
vanish, and hence 


N N N 
(2) - Xp)? = - Kaj)? + - Xpi)? 
- Rd? 


Therefore each term onthe right -hand side of (2), 
when divided by an appropriate number of degrees 

of freedom, yields an independent estimate of pop- 
ulation variance. These estimates may be com- 
pared by the usual F-ratio procedure in testing 

hypotheses regarding the equality of population 

variances associated with each level of classifi- 
cation. The resulting analysis of variance table 

[ is shown at the tope of page 119].”’ 


The analysis of variance by the hierarchies of 
the data of this study, following Professor 
Carroll’s model, is presented for all variables 
on the pages following. 

From Tables XXII through XXVI, it may be 


(Professor Carroll’s Analysis of Variance Table) 


Source of Variance 


Variance 
Estimate 


Betiveen M-classes 


Between B-classes 
Between A-classes 


Within A-classes 


2, 


N 

(Xpi - Xci)* 
N 

= (Kai - Xpi)’ 


N 
- Xai)’ 


By division 


By division 
By division 


By division 


Total 


(Xj - Xt)? 


TABLE XXIl 


ANALYSIS OF VARIANCE BY HIERARCHIES, FOR INCLUSIVE SCORES 


Source of Sum of 
Variation Squares 


Variance 
Estimate 


Between 
Communities 13, 481. 86 


Among 
Schools 80, 500. 80 


Among 
Teachers 104, 122. 62 


Within 
Teachers 410, 926.13 


Total 609, 031. 41 


13, 481. 86 


26, 833. 60 


3, 718. 66 


234. 41 


341.19 


COGAN 119 
m-1 
a-b 
N-a 
— 
— 
. 50 > .05 
7.22 < .01 
15. 86 < .01 
— 
1785 — 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE 


ANALYSIS OF VARIANCE BY HIERARCHIES, FOR PRECLUSIVE SCORES 


Source of Sum of Variance 
Variance Squares Estimate 


Between 
Communities 1,642.11 1,642.11 


Among 
Schools 76, 665. 42 25,555.14 


Among 
Teachers 130, 405. 45 , 4, 657. 34 
Within 
Teachers 500, 569. 38 285. 55 


Total 709, 282. 36 397. 36 


TABLE XXIV 


ANALYSIS OF VARIANCE BY HIERARCHIES, FOR CONJUNCTIVE SCORES 


Source of Sum of Variance 
Variance - Squares Estimate 


Between 
Communities 1, 379. 46 1, 379. 46 


Among 
Schools 21,774. 30 7, 258.10 


Among 
Teachers 91, 692.72 3,274.74 


Within 
Teachers 397, 231. 90 226. 60 


Total 512, 078. 38 286. 88 


120 
- 06 > .05 
5. 49 < .01 
mn 
> .05 
2.22 > .05 
14. 45 < 


COGAN 121 


seen that the F-ratios attain significance for 

every variable in the comparison of the among- 
teacher means with the within-teacher (actually 

the among-pupil) means. 

The hierarchical comparison between teach- 
ers and schools yields F significant at the .01 
level only for inclusive and preclusive scores. 

None of the variables attain significant F’s in 
the comparison of school variances with commu- 
nity variancey. 

With the exception of the preclusive and inclu- 
sive variables compared by teachers and schools, 
only the among-teachers with within-teachers 
(among-pupils) comparison remains significant 
after the correction provided by the analysis of 
variance for hierarchies. It appears that consist- 
ent and genuine differences observed are those 
among teachers, and the analysis gives evidence 
that the significant differences noted among the 
schools and communities in the simple analysis 
of variance are, with the two exceptions noted 
above, no more than would be expected in view of 
the differences among teachers. The demonstra- 
tion gives some ground for saying that schools 
and communities do differ, but that the di ffer- 
ences are attributable to teachers rather than to 
factors of intrinsic school or community differ- 
ences. It seems fair to conclude that the pupils’ 
reports on the ‘‘Survey’’ serve to differentiate 
sharply among teachers on all the variables of 
this study, and that the observed significant dif- 
ferences among schools and between comm uni- 
ties need to be qualified in the light of the evi- 
dence provided by the hierarchical analysis of 
variance. 

If these findings are kept in mind, thenit may 
perhaps be said that the socioeconomic 1 evel of 
the community can be viewed as a factor in the 
amount of work reported by the pupils of that 
community to the degree that the differences 
among teachers employed by the community re- 
flect that socioeconomic level. 


Standard Error of Measurement and 
Reliability of the Scales 


In the preceding computations an estimate of 
variance within groups was derived for each 
group. The square root of this error estimate 
provides a measure of standard error for each 
of the five scales. The reliability coefficient for 
each scale is also computed for the average score 
of a teacher of a group. The formula for the co- 
efficient of this average score by a group is 


where subscripts w and a denote the variance es- 
timate within-teachers and the am ong-teachers 


estimate respectively. Table XXVII presents the 
results of the computation. 

The reliability coefficients for group assess- 
ments are quite substantial. There is some rea- 
son to conclude that the five scales furnish reli- 
able measurements of the teacher traits and of 
pupil productivity when these measurements are 
computed from all the scores of the pupils report- 
ing on a given teacher. In view of these findings 
it would appear that the scales of the ‘‘Pupil Sur- 
vey’’ may be used with some confidence for the 
characterization of teachers interms of the traits 
examined in the course of this research. 


The Correlational Analysis 


The variables have been subjected to two kinds 
of correlational analysis: (1) the ‘‘perception’’ 
analysis, and (2) the trait analysis. The results 
of the perception analysis are reported in detail 
in another article. It may be relevant, however, 
to give a few indications of the major elements of 
this work. 

Correlations were computed for all of the var- 
iables with each other within each group. Thus, 
for each combination of variables there were 33 
coefficients. Each of these arrays of coeffi- 
cients was tested for significance by the applica- 
tion of the sign or the binomial series test; this 
served to establish the presence or absence of a 
trend in the relationships between the members 
of the paired variables, without reference to the 
magnitude of the trend. The t-test was then ap- 
plied in order to obtain an estimate ef the signif- 
icance of the relationships between the paired 
variables within each group. In one sense these 
procedures might be viewed as 33 replications of 
the research, with the significance of the coef- 
ficient of correlation established for each admin- 
istration (i.e., in each group). 

The results were inconclusive for the preclu- 
sive variable with both the required work and the 
self-initiated work scores. There was strong 
evidence, however, that the inclusive and the 
conjunctive variables were each related to each 
of the consequents, the relationship of the inclu- 
sive scores to the self-initiated work scores ex- 
hibiting the greatest stability (25 of the 33 coeffi- 
cients attained significance at or beyond the .05 
level). 

The findings as to the correlation of the inclu- 
sive scores with requiredand self-initiated work 
scores suggested a similar examination for the 
total group as a single sample and for the pupils 
in each community separately. In order to avoid 
the spuriousness that might result from throw- 
ing together groups having unlike means, the r’s 
were computed from the within-sums, which had 
been obtained in the course of an analysis of var- 
iance with covariance adjustment. The total r’s 
the between-group r’s, and the within-group r’s 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XXV 


ANALYSIS OF VARIANCE BY HIERARCHIES, FOR REQUIRED WORK SCORES 


Source of 
Variance 


Sum of 
Squares 


Variance 
Estimate 


Between 


Communities 


Among 
Schools 


Among 
Teachers 


Within 
Teachers 


Total 


10, 879. 76 


20, 997. 37 


109, 486.79 


435, 887.49 


577, 251. 41 


10, 879. 76 


6, 999. 12 


3,910.24 


248. 65 


323.39 


ANALYSIS OF VARIANCE BY HIERARCHIES, FOR SELF-INITIATED WORK SCORES 


TABLE XXVI 


Source of 
Variation 


Sums of 
Squares 


Variance 
Estimate 


Between 


Communities 


Among 
Schools 


Among 
Teachers 


Within 
Teachers 


Total 


659. 88 


17, 143. 55 


50, 732. 43 


399, 407. 86 


467, 943.72 


659. 88 


5, 714. 52 


1, 811.87 


227.84 


262.15 


122 
1 
1.56 > .05 
3 
1.79 > .05 
a8 — 
15.73 <.01 
1785 
0.12 > .05 
3.15 < .05 
7.95 <.01 
1785 


TABLE XXVII 
STANDARD ERROR OF MEASUREMENT AND RELIABILITY COEFFICIENTS OF THE SCALES 


I P Cc R s 


Range of Scores 23 to 115 26 to 130 26 to 130 1 to 150 0 to 120 
Variance Estimate 234. 41 285. 55 226. 60 248. 65 227. 84 


Standard Error 15. 31 16. 90 15.05 15.77 15.09 


Reliability Coeffi- 
cient for Group 
Assessment . 962 


TABLE XXVIII 


CORRELATION COEFFICIENTS OF INCLUSIVENESS WITH REQUIRED WORK SCORES 
AND SELF-INITIATED WORK SCORES, FOR THE TOTAL SAMPLE AND 
FOR EACH COMMUNITY 


Total Sample 
Total r 


Among Groups r 
Within Groups r 


Community A 
Total r 


Among Groups r 
Within Groups r 


Community B 
Total r 


Among Groups r 
Within Groups r 


COGAN 123 
956 937 944 . 894 
af TIR P df rIs P 
31 .63 <.01 31 62 <.01 
1752 <.01 1752 .35 <.01 
926 36 <.01 926 <.01 
: 12 .64 <.01 12 .63 <.01 
913 <.01 913 30 <.01 
17 <.025 17 <.01 
838 <.01 838 .39 <.01 


124 JOURNAL OF EXPERIMENTAL EDUCATION 


are presented in Table XXVIII. 

An examination of Table XXVIII re veals that 
all the r’s except the among-groups ryR of com- 
munity Bare significant atthe .0l level. The 
single exception is significant at the .025 level. 


With 31 degrees of freedom for the among- 
groups r’s of Table XXVIII, all these coefficients 
are significant at the .01 level. It may be saia 
then that the average scores of the groups for 
teachers’ inclusiveness are positively significant- 
ly related to the average scores of the groups on 
required work and on self-initiatead work. There 
appears to be some evidence from scores on the 
‘*Pupil Survey’’ that in the operational terms of 
this research, inclusiveness is an observable 
and measurable trait of the teachers in this sam- 
ple, and that the degree in which this trait is re- 
ported is related to the degree of productivity re- 
ported by the pupils. Since the conjunctive 
scores appear to be quite consistent with and 
closely related to the inclusive scores, this last 
analysis was not performed for the conjunctive 
variable. 


Summary and Conclusions 


The analysis of variance reveals that the five 
scales are capable of differentiating sharply 
among groups (teachers), and that although those 
differences extend up through the two communi- 
ties, each taken separately as a single sample, 
and through the total sample population takenas a 
single group, the genuine differences are those 
among the teachers, the other steps of the hier- 
archy merely reflecting the aifferences. The sig- 
nificant differences noted among the teachers, con- 
tinue to appear when the analysis of variance is 
performed among the teachers of each subject 
separately. 

The reliability of each scale is quite substan- 
tial, the coefficients ranging from . 89 to .96 for 
the five variables. 

The average scores on theteachers’ inclusive- 
ness are significantly related to the average pro- 
ductivity scores for both of the criterion meas- 
ures of pupil work. It seems fair toconclude that 
as measured in this research, the teachers’ inclu- 
siveness is an observable and measurable trait of 
teachers and that itis relatedtothe amounts of the 
pupils’ required and self-initiated work scores. 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, December 1958) 


PURPOSE 


MOST TEXTBOOKS on educational m easure- 
ment present rules or suggestions for construct- 
ing objective test items of various types. One 
suggestion often made is that the distracters 
(wrong responses) for choice-type items should 
be plausible. The application of this rule re- 
quires that an item writer have some notion con- 
cerning what alternatives are likely to serve as 
efficient distracters. For an experienced item 
writer this selection of plausible distracters 
may be relatively easy. A relatively inexper- 
ienced item writer, however, May experience 
difficulty in devising plausible wrong choices. 

Some writers on educational test construction 
have suggested using students’ responses to 
items set up in free response form as a source 
of possible distracters. In general, these writ- 
ers suggest that using student responses to free 
response items as a source of distracters for a 
choice-type item often results ina better item 
than if such data is not used. 

Although the use of free response data has 
been recommended by many authors of textbooks 
in educational measurement, research studies 
designed to evaluate the procedure have been rel- 
atively few. Kelley (1) in 1937 used free re- 
sponse data in constructing a vocabulary test and 
concluded that the procedure was questionable in 
view of the work involved and the results obtained. 
Frederiksen and Satter (2) in 1953 reported on 
the use of free response data in constructing 
arithmetic problems and found the method was of 
some help. They did not, however, make any di- 
rect comparison between items constructed with 
and without knowledge of such data. 

The results of these two research studies sug- 
gest that the value of free response data is ques- 
tionable. In addition, the procedure has appar- 
ently been applied only to such factual school sub- 
jects as vocabulary and arithmetic, where it is 
relatively easy to define incorrect answers. As 
far as it could be determined, no experimental 
evidence has been obtained in ‘‘content’’ areas 
such as the social studies. Furthermore, pre- 
vious studies appeared to have only one person 


THE USE OF FREE RESPONSE DATA IN 
WRITING CHOICE-TYPE ITEMS 


DESMOND L. COOK 


Purdue University 


doing the item writing. As the method of using 
free response data is offered as a general sugges- 
tion for all item writers, it is possible thatthe re- 
sults of previous studies have been largely deter- 
mined by the ability of a particular writer to use 
the free response data, rather thanto the general 
value of the method itself. In view of the above 
considerations and the rather limited amount of 
research on the general problem, further study of 
this problem seems warranted, particularly with 
regard to item writer differences in areas where 
free response answers are not so easily categor- 
ized. 

The purpose of this study was to determine 
whether or not choice-type items written with the 
aid of information on student free responses could 
be more discriminating than items written without 
the use of such information. If items written with 
the aid of free response uata are more highly dis- 
criminating than those written without such help, 
the use of free response data would be considered 
valuable. 


PROCEDURES 


Selection of Items and Writers 


To provide a sample of students’ wrong an- 
swers, three tests, each composed of thirty free 
response items, in the area of Contemporary Af- 
fairs for 1953 were administered as part of the 
freshmen testing program to 720 men entering 
the State University of lowa in September 1953. 
Each of the 90 items appeared as a direct ques- 
tion which the students were asked to answer in 
one or a few words (not more than one sentence). 

This area and these students were chosen for 
séveral reasons. First, the subject matter in- 
volved and the type of question used to measure 
the student’s grasp of it, were broadly represent- 
ative of those encountered in avariety of subjects 
of study at the college level. Results of an inves- 
tigation in this area could be generalized more 
widely than if a more restricted or specialized 
area had been chosen. 

Second, since a sample of trained item writ- 


126 JOURNAL OF EXPERIMENTAL EDUCATION 


ers was required to test the usefulness of free re- 
sponse data, it was necessary to choose an area 
in which training in item writing rather than 
specialized subject matter competence coula 
be the determining factor in choice of item 
writers. 

Finally, it was desired to choose an area in 
which the tests could be administered under nat- 
ural conditions as part of a purposeful testing 
program involving a large number of well moti- 
vated students. It was also decided to conduct 
the testing under the close supervision of the au- 
thor. These conditions seemed best met by ad- 
ministering a test in Contemporary Affairs to 
students entering the College of Liberal Arts at 
the State University of Iowa. 

The three tests were scored and item analy- 
sis data obtained separately for each test. The 
sixty most discriminating items among the ninety 
were selected for final use in studying the useful- 
ness of free response data. 

For each of the sixty selected items, the re- 
sponses which had been marked incorrect in the 
original scoring were listed until ten different 
wrong answers had been listed, or until all the 
responses to a particular item had been exam- 
ined. All of the listing of incorrect answers was 
done by the writer. 

Wrong answers which expressed the same 
general idea were not listed separately. Deci- 
sions concerning which wrong answers to list 
separately required subjective judgment. For 
some items, the differences between the various 
incorrect responses were readily established. 
Thus the incorrect responses to an item calling 
for the name of a book, movie, or person were 
easy to categorize. On the other hand, wrong 
responses to items dealing with the reasons for 
or effects of an action, or with the attitude held 
by a particular group, were more difficult tocat- 
egorize. The list of wrong answers to Item 1, 
for example, is presented below. The number 
after each incorrect answer is the frequency with 
which that answer occurred in papers examined 
to secure ten wrong answers. 


1. Soldiers from what countries opposed United 
Nations forces in the Korean fighting? 
Answer: North Korea and China 

Fre- 
quency 
Communist China and Communist 
Russia 
Korea, China, and Russia 
North Korea, Indochina 
Red China 
North Korean soldiers and also 
Manchuria 
North Korea 
Russia and North Korea 
China and Korea 


China and Manchuria 
Soldiers from North Korea opposed 
the UN. Russia helped with sup- 
plies and planes 1 


The use of free response data is offered as a 
general suggestion to all item writers. It may be 
however, that some writers are helped more than 
others by such data. To demonstrate that the 
method is of value as a general procedure, it was 
decided to use a group of item writers. Six item 
writers were used in the present study, each writ- 
ing twenty items, ten with andten without the help 
of free response data. This arrangement repre- 
sented a compromise between the desi rability of 
having a large group of item writers and having 
each writer work with a large number of items. 

The six item writers used in this study were 
selected from the better students ina graduate 
level introductory class in test construction. All 
of the writers had received some instruction in 
item writing. They may be regarded as repre- 
sentative of the large group of potential item writ- 
ers to which recommendations concerning use of 
free response data would normally be addressed. 


Construction of the Tests 


After the free response data had been tabulat- 
ed and the writers selected, the next step was to 
secure items written under the two methods of the 
problem, that is, with and without the use of free 
response data. 

Each writer first constructed a set of ten four- 
response multiple-choice type items ontenas- 
signed questions without knowledge of the free re- 
sponse data. They then constructed ten addition- 
al items on ten different assigned questions with 
the knowledge of the free response data. Under 
the conditions of the experiment, it would have 
been possible for each writer to use the same ten 
questions for writing both sets of items. This 
would balance out experimental errors associated 
with possible differences in the writer’s back- 
ground for handing different sets of questions. 
But it might also introduce a bias for or against 
the items written in the second phase (with the 
knowledge of free response data). Since an item 
writer would ordinarily deal with thesame ques- 
tion only once, and since the second source of er- 
ror could not be estimated in this experiment, it 
was decided to assign a different set of items to — 
each writer for the second phase of the item writ- 


The first set of questions was assigned to each 
item writer by arranging their names in random 
order, and designating the first as writer A, the 
second as writer B, andsoon. The items as- 
signed to A were the first ten items of the final 
sixty items selected from the free response test. 
Writer B received the next ten, and so on. 


COOK 127 


Asetof materials was prepared for each 
item writer. It contained a sheet of directions 
outlining the general nature of the study and pro- 
cedures to be followed, a list of items to be writ- 
ten, and several sheets of paper on which to 
write the items. The list of items consisted of 
ten iiem stems in question form and the corres- 
ponding ten correct answers. The directions 
told the writers to construct four response 
choice-type items using the item stem and cor- 
rect answer presented and providing three dis- 
tracters which would be effective. Theitem writ- 
ers were permitted to modify the item stem or 
the correct response if they felt it would im- 
prove the item but they were asked to hold such 
changes to a minimum. 

Upon completion of this first set of items, 
each writer was given another set of items to 
write. The ten items on this second set were 
different from those on the first set. The as- 
signment of the second set of items was made by 
randomly drawing ten items from those remain- 
ing after the items previously written had been 
excluded. 

The second set of materials consistedof a di- 
rection sheet, a set of tenitem cards, and paper 
on which to write the items. Each card carried 
an item stem, the correct answer, and the list 
of previously tabulated wrong responses. The 
directions were essentially the same as for the 
first set except that the writers were told that 
they could make any use of the free response 
data they wished. The writers were also asked 
to write a short statement expressing their opin- 
ion of the value of using free response data. As 
a result of the above procedure, 120 items were 
written by the six writers. Sixty of the items 
were prepared without knowledge of free re- 
sponse data, and sixty items were prepared with 
such knowledge. For convenience, that set of 
items written with knowledge of free response 
data will hereafter be referred to as K-items, 


and those written with no knowledge of free re- 
sponse data will be cal -items. 


Administration and Scoring of the Items 


The sixty K-items and the sixty N-items writ- 
ten under the procedure described above were 
used to construct two forms (A and B) of achoice- 
type test. The two forms of the choice-type test 
were constructed by alternating N-items and K- 
items. The result of this step was thaton Form 
A, the first item was an N-item, the second a 
K-item, the third an N-item, and so on. On 
Form B the first item was a K-item, second an 
N-item, the third a K-item, and so on. Each 
form thus had thirty K-items and and thirty N- 
items. The items were so arranged that the 
first item on each form dealt with the same top- 
ic. The alternation of the N-items and K-items 


was done so that responses to both types of items 
could be secured for any student taking the test. 
The forms themselves were distributed alternate- 
ly to the students taking the choice-type test. 

The administration of items written under the 
two conditions was accomplished when the tests 
were given to 303 new men and women students 
entering the College of Liberal Arts, State Univer- 
sity of lowa, in February and June 1954. One 
hundred fifty-one students took Form A and one 
hundred fifty-two students took Form B of the 
choice-type test. 

Three scores on the choice-type items were ob- 
tained for each student. The first score was the 
total number of items correct, the second was the 
total of N-items correct, and the third was the 
total of K-items correct. 

All three scores were based on number right. 
No correction for guessing was applied to any 
score. 


RESULTS 


Because of the method employed in construct- 
ing and distributing Forms A and B, it was antici- 
pated that they would yiela similar distributions 
of scores. The median scores and total ranges 
of scores were about the same for both forms. 
Hence the two forms of the choice-type test, and 
the two samples of students tested appearea to be 
approximately alike. 

The total score on the test for any student con- 
sisteu of the sum of his scores on the N-items 
and the K-items. Figure 1 presents the distribu- 
tion of scores on the two types of items. To make 
this comparison, frequency distributions were 
made of N-items and K-items scores for each stu- 
dent, regardless of which form he took. This 
graphical comparison shows that the scores on 
the N-items tended to be higher than those on the 
K-items. 


Comparison of Discrimination and 
Difficulty Indices 


Indices of discrimination and difficulty were 
calculated for the N-items andfor the K-items. 
The data was secured by the Upper- Lower Meth- 
od (3). This method consists of selecting the 
highest and lowest 27 percent of scores ina 
group of test papers as the criteriongroups. The 
item discrimination index is the ratioof the differ- 
ence between the number answering the item cor- 
rectly in upper and lower groups to the number 
in either group. The item difficulty index is the 
ratio between the number in both groups marking 
the item correctly to the total number in both cri- 
terion groups. This ratio is converted to a per- 
cent by multiplying by 100. 

The distribution of the discrimination and dif- 
ficulty indices for the N-items andthe K-items is 


JOURNAL OF EXPERIMENTAL EDUCATION 


FIGURE 1 


DISTRIBUTION OF RAW SCORES ON ITEMS WRITTEN WITHOUT 
FREE RESPONSE DATA (N-ITEMS) AND ON ITEMS WRITTEN 
WITH AID OF FREE RESPONSE DATA (K-ITEMS) FOR ALL 
INDIVIDUALS IN SAMPLE TAKING CHOICE-TYPE TESTS 


N Items K Items 
(N 303) (N 303) 


128 | 
vo 
Score 
30 
25 
15 
10 
5 
High 27.0 25.0 
21.9 20.3 
an 18.1 16.9 
14.4 13.4 
Low 6.0 4.0 


COOK 129 


presented in Figure 2. The length ofthe rec- 
tangular bar represents the inter-quartile range. 
The horizontal line running across the rectangle 
represents the median. Inspection of this fig- 
ure shows that the median discrimination index 
for the N-items was slightly higher than for the 
K-items. The inter-quartile range was a little 
greater for the K-items than for the N-items as 
was the total range. It may also be observea 
from Figure 2 that the median difficulty indices 
for the two types of items of item writing were 
about the same. The inter-quartile range wasa 
little larger for the K-items than for the N-items. 
Summarizing the data presented in Figure 2, it 
may be stated that the differences between the 
two types of items in discrimination and difficul- 
ty appeared to be small. 

A more precise comparison of the effect on 
discrimination and difficulty indices of the two 
methods of writing was made through analysis of 
variance techniques. For this purpose, a ‘‘treat- 
ment by subjects’’ analysis described by Lind- 
quist (4) was employed. Individual writer means 
for both discrimination and difficulty are present- 
ed in Table I along with the general mean for 
each method of item writing and for all items. 
The summary table for the analysis of variance 
for the discrimination indices is presented in 
Table II. The test of significance for the main 
effects (the two methods of item writing) was 
made by obtaining the ratio between mean square 
for methods and the mean square for interaction 
(methods by writer). The hypothesis tested is 
the null hypothesis, i.e., that the mean discrim- 
ination index for a large population of items writ- 
ten with knowledge of free response data is the 
same as that for a large population of similar 
items written without the knowledge. If the hy- 
pothesis cannot be rejected atthe five percent 
level of confidence, then any difference found 
between the mean values may be attributed to 
sampling fluctuations. The obtained F-ratio was 
.22 which obviously is not significant. There- 
fore, the null hypothesis is tenable. 

Using the mean square for interaction as the 
error term in this analysis permits one to gen- 
eralize to a population of item writers of which 
those used in the study are a random sample. In 
view of the results of the test, it may be said 
that with items and item writers like those used 
in the study, differences in mean item discrim- 
ination, if any, are not large enough to be estab- 
lished in a study of this scope and precision. 

Normally, no test need be made of the meth- 
ods by writers’ interaction even though more 
than one measure is available for each writer. 
For as Lindquist points out, anintrinsic method 
by writers’ interaction is taken for granted in 
this experimental design. That is, one may be 
relatively certain that the methods have a differ- 
ent effect from writer to writer, or that onecan 


use the free response aata better than another. 
However, it is easy to test this interaction, and 

in the present experiment, it is also instructive. 
The test for significance of interaction is the ratio 
of the mean square for methods by writers over 

the mean square for within cells. The obtained 

F-ratio was 2.569 with 5 and 108 degrees of free- 
dom. This is significant at the 5 percent level. 
On the basis of this result, it can be said that the 
item writers like those in this study differ in their 
ability to use free response data effectively. 

The analysis of variance procedures used with 
the discrimination indices were also applied tothe 
difficulty indices for the same items. The writer 
means, methods means, and the general mean of 
the difficulty indices are also given in Table I. 
The summary table for the analysis of variance, 
using the mean values as criterion measures, is 
given in Table III. 

The test of the null hypothesis with respect to 
the effect of free response data on difficulty in- 
dices was made as before, using the ratio of mean 
square for methods over mean square methods by 
writers’ interaction. The obtained F-ratio was 
2.16 with 1 and 5 degrees of freedom. At the 5 
percent value, the hypothesis is tenable that the 
mean difficulty indices for the items written by 
the two methods are the same. 

Since there were also ten difficulty indices 
available for each writer under each method, it 
was possible to compute a within cells mean 
square, and to use this as an error term to test 
the methods by writers’ interaction. The obtained 
F-ratio was .406, which obviously is not signifi- 
cant. If the effects of use of free response data 
on item difficulty are different for various item 
writers these differences are not large enough to 
be demonstrated in the present study. 


Comments by the Item Writers 


The directions for the second set of items to 
be constructed asked each writer to make a brief 
statement expressing his opinion of the value of 
free response data in writing choice-type items. 

Although all of the writers stated that the free 
response data had been of some help, there was 
not complete agreement as to its value. Writers 
C, D, and E felt that its value was rather limited 
and that items written without the data would have 
been nearly the same as those written with knowl - 
edge of free response data. 


SUMMARY 


The purpose of this study was to investigate 
the value of using free response data as a source 
of distracters in constructing choice-type items. 
Specifically, comparisons were made between the 
discrimination and difficulty indices for the items 
constructed by a group of six-item writers both 


130 JOURNAL OF EXPERIMENTAL EDUCATION 


FIGURE 2 


DISTRIBUTION OF DISCRIMINATION AND DIFFICULTY INDICES FOR 
ITEMS WRITTEN WITHOUT AID OF FREE RESPONSE DATA 
(N-ITEMS) AND FOR ITEMS WRITTEN WITH AID OF FREE RE- 
SPONSE DATA (K-ITEMS) 


Discrimination Difficulty 

90 N-Items K-Items N-Items K-Items 100 
+ 9 
4+ 80 
. 60 = — 70 
50 60 
L 4 50 
30 4 40 
. 20 = - 30 
ik ae 
00 = 10 
-.10 0 

High 63 276 93 

ohh 70 70 

an 33 228 57 55 

+20 45 39 


Low -.02 -.02 22 


TABLE I 


MEAN DISCRIMINATION AND DIFFICULTY INDICES FOR ITEMS WRITTEN WITH- 
OUT AID OF FREE RESPONSE DATA (N-ITEMS) AND FOR ITEMS WRITTEN 
WITH AID OF FREE RESPONSE DATA (K-ITEMS) FOR INDIVIDUAL ITEM 
WRITERS 


Discrimination Difficulty 


Writers N-Items K-Items All Items N-Items K-Items__ All Items 
45 .26 . 36 60 62 


.35 . 26 -31 60 53 56 
32 -45 . 39 59 62 
. 30 . 30 . 30 56 59 
. 38 31 49 51 
-19 .29 55 51 


General 
Means 


TABLE Il 


SUMMARY TABLE FOR ANALYSIS OF VARIANCE FOR 
DISCRIMINATION INDICES 


Degrees of Sums of 
Sources Freedom Squares 


Methods (m) 1 15. 87 
Writers (w) 5 261.77 


Methods-Writers 
Interaction (mw) 265. 64 


Total 643.27 


15. 87 
F= MSmw = 7313 = F5(1, 5) = 6.61 


MSmw 73.13 
© = 2.569 F = 2.30 
Interaction: F 05(5, 100) 


COOK 131 
B 
D 
E 
F 
33 .31 .33 58 55 57 
Mean 
Squares 
52.35 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE III 


SUMMARY TABLE FOR ANALYSIS OF VARIANCE FOR 
DIFFICULTY INDICES 


Degrees of 
Sources Freedom 


Sums of 
Squares 


Mean 
Squares 


Methods (m) 1 
Writers (w) 5 


Methods-Writers 
Interaction (mw) 


30. 40 
257.00 


70. 48 


30. 40 
51.40 


14.09 


Total 


357. 88 


MSm 30. 40 


F= = = 2.16 F = 6.61 
MSmw 14.09 05(1, 5) 


Interaction F = MSmw _ 14.09 


= 406 - 2.30 
MSw cells 34.70 05(5, 100) 


132 
| 11 


133 


with and without knowledge of the student’s re- felt that the free response data was helpful 
sponses to these same items in free response while others felt they could do just as well 
form. without such data as with it. 


The following results were obtained: 


. The frequency distributions of discrimination 
indices for the items written with and without 
the free response data were similar. Items 


written with free response data appeared to be . 


slightly less discriminating than those writ- 
ten without the response data, but the diffe r- 
ence was not significant. 


The frequency distributions of difficulty in- 
dices for the items constructed with and with- 
out the use of free response data were also 
similar. Items written with free response 

data appeared to be slightly more difficult 
than items written without the data but the dif- 
ference was not significant. 


Different item writers appeared to differ in 
the effectiveness with which they utilized free 
response data to write discriminating test 
items. 


. Statements made by writers show that some 


REFERENCES 


- Kelley, Victor H. ‘‘An Experiment with Mul- 


tiple-Choice Vocabulary Tests Constructed 
by Two Different Procedures,’’ Journal of 
Experimental Education, V (Marc 

pp. 249-50. 


. Frederiksen, Norman, andG. A. Slatter. 


‘*The Construction and Validation of an Arith- 
metical Computational Test,’’ Educational 


and Psychological Measurement, Xin (1953), 
pp. 300-37. 


. Johnson, A. Pemberton. ‘‘Noteson a Sug- 


gested Index of Item Validity: The U-L Index,’ 


Journal of Educational Psychology, XLII 
(December 1951), pp. 499-504. 


. Lindquist, E.F. Design and Analysis in Ex- 


riments in Psychology and Education (Bos- 
ton: Houghton-Mif flin , 1959), Ch. 6. 


COOK || 
1 
| 
4 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, December 1958) 


ANALYSIS OF CORNELL ORIENTATION INVENTORY 
ITEMS ON STUDY HABITS AND THEIR 
RELATIVE VALUE IN PREDICTION 
OF COLLEGE ACHIEVEMENT’ 


PARVIZ CHAHBAZI ** 
Cornell University 


SINCE THE end of World War II, several stud- 
ies have been conducted at Cornell University to 
provide the means for more efficient prediction 
of college achievement. Originally, in 1942, 
three tests, the Ohio State Psychological, the Co- 
operative Natural Science and the C cooperative 
Mathematics, were administered to all entering 
students. The first group studied was composed 
of 324 students enrolled as freshmen between 
1942 and 1946 at the New York State College of 
Agriculture at Cornell University. When each 
predictive variable was studied with the influence 
of ali the other variables held constant, only the 
secondary school average and the Ohio State Psy- 
chological Test score proved to be of any real 
value in predicting academic achievement (Table 
I). The fourth order partial correlation coeffi- 
cient of the college average to the secondary 
school average was .353 and to the Ohio State 
Psychological score .208 for all groups com- 
bined. The Cooperative Mathematics Test and 
the Cooperative Natural Science Test were so 
highly correlated with the Ohio State Psycholog- 
ical and With the secondary school average that 
their separate influence was negligible in predict- 
ing the first-term college average. A multiple 
coefficient of correlation between the first-term 
average and the team of the secondary school av- 
erage and the three tests, all weighted scores, 
was .57 for all students and .64 for those en- 
rolled in the general curriculum of the College 
of Agriculture. 

In a second study! the secondary school aver- 
age still was the best single measure for predict- 
ing success in college. The Ohio State Psycho- 
logical continued to be the best single test in the 
battery used. Its correlation with the first-term 
college average was found to be .36 compared to 
.45 in the previous study. 

In a follow-up of these two studies the Cornell 
. Mathematics Test was substituted for the Coop- 


*All footnotes will be found at end of article. 


erative Mathematics Test. Its correlation with 
the first-term average was found to be . 307 com- 
pared to .251 of the Cooperative Mathematics 
Test found in the first study. The Cooperative 
Science Test, with the exceptionof the secondary 
school average, had the highest correlation coef- 
ficient (.379) when correlated with the first-term 
average. When combined with the secondary 
school average, and both weighted, the Coopera- 
tive Science Test score and the secondary school 
average, derived from New York State Regents 
Examination scores, had a multiple correlation 
coefficient of .5251 with the first-term average.2 

In an unpublished follow-up of these studies the 
coefficient of correlation between the first-term 
average and the Cornell Mathematics Test scores 
was raised from . 307 to .368 and between the Co- 
operative Science Test scores and the first-term 
average from .379 to .469. The Ohio State Psy- 
chological correlated .407 with the first-term av- 
erage compared to .371 in the third study. This 
fourth study added the Cooperative Reading Test 
to the battery of tests already in use at the New 
York State College of Agriculture at Cornell Uni- 
versity. The Test as a whole correlated .454 
with the first-term average and its Speed of Com- 
prehension part correlated .461. A multiple cor- 
relation coefficient of .617 was found between 
first-term average and the team of secondary 
school average, the average derived from the Re- 
gents Examination scores, and the Speed of Com- 
prehension score of the Cooperative Reading Test. 

Table II shows a comparison of the findings of 
these previous studies with those of the present 
study. 


Cornell Orientation Inventory 


In the first study mentioned above,? poor 
health, lack of self-discipline, inability to organ- 
ize time and material, too much outside work and 


-Asq pue Teuoneonpy ,, Jo YIOA MAN 9Y} jo ssadong 


LI0° 8S0° aouatied 


802° L82° 602° 
OTYO 0} aZaTIOD 


200° 0g0° 860° TeInjeN 


0} 


pamoTfog wnmowaing jo adAy 


= 
< 
3) 
> 
a 
Z 
fal 
4 


IVILUVd AM NMOHS SV LNVLS 
-NOO SI SHOLOVA ONINIVNGY AHL JO JONANTANI AHL NAHM 
WOLOVA GNV ADVUAAV ADATIOO NAAIMLAG dIHSNOLLV'1du AHL 


I 


| 


[]aUIOD 


UOTJRIUITIO [JaUIOD ‘3109S 


98¢° g9¢° «(1SZ *) 9109S 


8100S 
OTYO 


a10os uotsuayaiduiog jo paeds 


=N OLZ=N 681 =N OST=N =N 


€S-2rél AHL SLI SV GASN SLSAL ‘Tv 
UNV ADVUAGAV WUAL-LSYld NAAMLAG NOLLY AO VAWOO V 


AaTaVL 


CHAHBAZI 137 

. 
. N 
: 
. 

N 

al 

N 

* 

N 

N 


138 JOURNAL OF EXPERIMENTAL EDUCATION 


too many extracurricular activities were men- 
tioned as possible causes of failure in college. 
Therefore, for a more efficient prediction of col- 
lege success, the development and use of an in- 
ventory covering these factors was recommend- 
ed. Such an inventory was developed and given 
to all the 1948 freshmen of the College of Agri- 
culture. The choices for each of the 17 questions 
on the original Corneil Orientation Inventory 
were graded from 1 to 5, one indicating that the 
factor would not hinder the student’s achievement 
in college and five indicating that the factor would 
hinder his achievement considerably. The items 
onthe Orientation Inventory were scored by total- 
ing the choices checked by the students. The 
choices thus totaled provided scores which cor- 
related -.22 with the first-term average. 

The present study began with the 1951 fresh- 
men and its data include complete records of 
freshmen entering the New York State College of 
Agriculture at Cornell University in September 
1951, 1952 and 1953. The 1949 freshmen were 
given a Revised Cornell Orientation Inquiry? con- 
sisting of 26 items, each having a five-choice 
answer, instead of the 17-item Inventory given 
to the 1948 students. In 1950 six more items 
were added to the 26-item Inquiry and the result- 
ing 32-item Cornell Orientation Inventory was 
used without any further changes on the 1951, 
1952 and 1953 freshmen. Approximately 6 per- 
cent of the total number of students in these three 
classes were eliminated because their scores on 
the Orientation Inventory or one or more of the 
four tests used for predictive purposes or their 
secondary school or college averages were not 
available. Our sample in this study, therefore, 
is a total of 813 students in the College of Agri- 
culture at Cornell University comprising 94 per- 
cent of the entire freshmen classes in the aca- 
demic years 1951-52, 1952-53 and 1953-54. 


Purpose 


The purpose of this study was to determine 
the relative validity of the items on the Orienta- 
tion Inventory compared to the validity of sever- 
al aptitude and achievement tests and secondary 
school averages and to develop a multiple- 
regression equation for predicting first-term col- 
lege averages for the 1951, 1952 and1953 fresh- 
men classes in the College of Agriculture at Cor- 
nell University. 


Procedure 


Secondary school averages and scores on four 
tests were used for the development of a prelim- 
inary regression equation. The tests were: 
(1) Cooperative Reading Test, the Speed of Com- 
prehension part, (2) Cornell Mathematics Test, 
(3) Cooperative Science Test, and (4) Ohio State 


Psychological Test.® All the standardized tests 
were administered according to instructions 
in their manuals. 

The first-term college averages of the stu- 
dents were used as the criterion variable. Noes- 
timates of reliability were available for the marks 
in any of the courses. For the 813 students, val- 
idity data are presented in Table II]. Here the 
mean and standard deviations, validities and 
intercorrelations of the predictors are shown. 
From these data, with the exception of the data 
for the Cornell Orientation Inventory, the multi- 
ple-regression equation 


Xe = .388X; + .077X_ + .157Xg +. 115X4 


+ .007X5 + 22. 783 


was developed for predictingfirst-term averages 
(1,2). The multiple coefficient of correlation (R) 
for this equation was found to be .536. The coef- 
ficient of multiple determination was .287 which 
shows that we have accounted for 28.7 percent of 
the variance of freshman averages. 

On the basis of this regression equation, first- 
term averages were preaicted for all the 813 stu- 
dents. If the actual first-term average of any 
student was greater or less than the predicted 
first-term average of the same student by five or 
more points the Orientation Inventory for that 
particular student was sorted into one of two sep- 
arate groups. One group was composed of those 
students making more than five points above the 
score predicted for them on the basis of the equa- 
tion. These were called the ‘‘over -achievers’’. 
The second group was composed of those indiviu- 
uals making more than five points lower than the 
predicted score for them. These students were 
called the ‘‘under-achievers’’. The responses 
made on the Orientation Inventory by these two 
groups were then analyzed for each item by count- 
ing the number of responses that were made by 
members of each group to each of the five choices 
possible in every item. In computing the chi- 
squares, adjacent frequencies have been com- 
bined in order to avoid expected frequencies un - 
der five so as not to violate the requirement of 
not-too-small expected frequencies. 


Results 


Eight of the 32 items on the Cornell Orienta- 
tion Invento.y produced very low probabilities on 
the chi-square tests of significance. These were 
items 4, 9, 10, 13, 15, 22, 24, and 25. With the 
exception of item 10, with a probability of .01, 
and item 15, with a probability of .02, they all 
had the very low probability of <.01. This is the 
probability that with a given number of degrees 
of freedom the chi-square value obtained in the 
comparison of the distribution of the sample with 


SSO “PL $90°LS O8@°LE S98°I9 262 °98 


260 L80 °- OOT *- UOTIEWATIO 
Surpeay 


Arepuodsag 


pue 
uolId}IID 


SATAVMVA NOMALNSD ANV 
dO SNOLLVIAGG GUVANVLS GNV ‘SNVAW ‘SNOLLV 


Ill ATAVL 


CHAHBAZI 139 
o N o 
am 
ses :2 
N wo wo 
. 
w 
N 
N 
| N wo U 


S 
< 
8 
a 
= 
: 
6 
= 


L00° 
OST 


S261 


81Z0° 
SOT?" 
0880° 
£922" 


AIOJUSAUT 
OTYO 
Surpesy aatjesadoog 


saseviaay jooyog Arepuosas 


UOTOIperd 


aTaVMVA NOLLOIGGHd HOVE YOd LHOIGM NOISSAUDAY GNV LHOIGM 


AI ATEVL 


140 


CHAHBAZI 


the theoretical series dictates that the sample 
belongs to or has arisen out of such a series. 
These eight items which proved to have predic- 
tive value are listed below: 


A) Item 4. I neglect certain courses because of 
lack of interest. 


1. Never 2. Rarely 


3. Occasionally 
4. Almost always 


5. Always 


B) Item 9. When in high school it was necessary 
for me to study at home 


1. Almost every night 2. Several nights 
per week 3. One or two nights per week 
4. Infrequently 5. Never 


C) Item 10. It is possible for me to concentrate 
on my studies 


1. Under almost any conditions 2. When 
everything is quiet andI am alone 3. Af- 
ter a relatively long warm-up period 
4. Only when intensely interested in the 

5. Under practically no condi- 


material 
tions 


D) Item 13. Choose the statement which best de- 
scribes how you feel about your note- 
taking. 


1. Exceptionally neat and efficient—no re- 
organization necessary. 2. Neat and ef- 
ficient—a little reorganization is some- 
times necessary 3. Some reorganiza- 
tion always necessary 4. Considerable 
reorganization always necessary 5. Im- 
possible to get my notes well organized 


E) Item 15. If I am to remain in college, it will 
be necessary for me to earn 


1. All of my expenses 2. My board and 
room 3. My board 4. Myroom 5.A 
very small part or none of my expenses 


F) Item 22. Choose the statement which best de- 
scribes how you organize your time. 


1. Iam usually able to do a little more 
than is required of me in my courses 
2. I am never behind with my regular as- 
signments in my courses 3. I have all 
that I can do to keep up with the amount of 
work required in my courses 4. I fre- 
quently do not have time to do the mini- 
mum amount of work required in my 
courses 5. I never seem to have time 
to do the minimum amount of work re- 
quired in my courses. 


G) Item 24. Before registering at Cornell, I 


1. Visited the campus many times, since my 
homeisnearby 2. Visitedthe campus and 
participated in4-Hor high-school activi- 
ties 3. Visited the campus to talk about ad- 
mission 4. Visitedthe campus to attend 
athletic events 5. Had only driven 
through or had never visited the campus 


H) Item 25. The average number of hoursI spend 
on study each week is 


2. 25-29 
5. Under 15 


1. 30 or more 
4. 15-19 


3. 20-24 


With the possible exception of items 15 and 
24, all the significant items were in the ‘“‘Study 
Habits’’ area of the Orientation Inventory. The 
items in the areas of ‘‘Motivation’’ and ‘‘Adjust- 
ment’’ do not appear as well represented in the 
final analysis. A partial score was then comput- 
ed for all the 813 students scoring only the eight 
items mentioned above. These partial scores 
were then correlated with the criterion and with 
the other prediction variables. The coefficients 
of these correlations are shown in Table II. 
Even though the items on study habits did not cor- 
relate very highly with the criterion measure, 
the actual coefficient of correlation being only 
-.257, their coefficients of correlation with other 
variables were much lower, -.100 and below, 
making their contribution to the multiple-correla- 
tion much greater. 

From these correlations and intercorrelations 
(as shown in Table III) beta weights and regres- 
sion weights were obtained for each prediction 
variable. These are presented in Table IV. The 
previous regression equation was now changed in- 
to the following: 


Xo = .366X; + .074X2 + .150X3 + .110X4 
+ .007X5 - .539Xg + 36.794 


which has a slightly smaller positive weight for 

each of the previous prediction variables and a 

rather large negative weight for the study habits 

score of the Orientation Inventory. This rather 
large regression weight for the Orientation Inven- 
tory is due, first of all, to its very low intercor- 
relations with the other prediction variables and 

then to its very small standard deviation as com- 
pared to the standard deviation of the criterion 

variable (Sc/ag = 2.799). 


The multiple coefficient of correlation(R) for 
this second multiple-regression equation was 
found to be .569 compared to .536 for the first 
equation. The coefficient of multiple determina- 
tion this time was .323, compared to .287 for 


142 JOURNAL OF EXPERIMENTAL EDUCATION 


the first equation, which shows that we have 
accounted for 32.3 percent of the variance 

of freshman averages when we included the 
score on the study habits part of the Cornell 

Orientation Inventory among the prediction 
variables compared to the 28.7 percent that we 
had accounted for by the previous equation 
without including the Orientation Inventory 
score. 


Summary and Conclusions 


Although the coefficient of correlation between 
the first-term average and the study habits score 
of the Cornell Orientation Inventory was merely 
-.257, the intercorrelations between this score 
and the other prediction variables were very 
low, -.100 and lower, which resulted ina 
rather large regression weight, -.539, and a 
large beta coefficient, -.1925, for this score. 
In the light of these facts, it would appear 
that 


1. Research in this area should be continued. 
2. The scope of the study habits part of the 
Cornell Orientation Inventory might be in- 
creased. 
3. Other items having to do with college ad- 
_ justment and motivation might be construct- 
ed. 


FOOTNOTES 


* The tests in this study were administered in 
1951, 1952, and 1953 by the Cornell Univer- 


sity Testing Service. The Cornell Orienta- 
tion Inventory was originally developed in 
1948 by Dr. John P. Hertel and Dr. Francis 
J. DiVesta. 


Now Assistant Professor of Psychology at 
Lake Erie College, Painesville, Ohio. 


Parviz Chahbazi. ‘‘Prediction of Achieve- 
ment in a College of Agriculture, ’’ Educa- 


tional and Psychological Measurement, XV 
(Winter pp. 


Parviz Chahbazi. ‘‘Use of Projective Tests 
in Predicting College Achievement, ’’ Educa- 


tional and Psychological Measurement, XVI 

Francis J. DiVesta, Asahel D. Woodruff and 
John P. Hertel. ‘‘Motivation as a Predictor 


of College Success,’’ Educational and Psy- 
chological Measurement, utumn ), 


pp. 339-48. 


Robert L. Egbert andGlennR. Hawkes. ‘‘Use 
of an Orientation Inquiry as an Aidin Predict- 
ing Success in College Agriculture C urricu- 
lum,’’ Journal of Educational Research, 
XLIV (December 1950), pp. 295-302. 


Hertel and DiVesta, op.cit., p. 394. 


DiVesta et al., op. cit., p. 342. 
Egbert and Hawkes, op.cit., p. 295. 


For detailed descriptions of these tests see 
footnotes 1, 2, 3. 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, December 1958) 


THE RELATIONSHIP OF PEER GROUP RATING TO 
CERTAIN INDIVIDUAL PERCEPTIONS 
OF PERSONALITY 


J. W. RAMEY 
Teachers College, Columbia University 


TO WHAT degree will doctoral students who 
are working closely with one another in a depart- 
ment think of themselves as a group? This ques- 
tion came to mind as a result of reviewing ar- 
ticles by Cattell and Cottrell (2,3). The tendency 
of individuals to adopt a ‘‘we’’ attitude when they 
are working toward common goals, and to take 
on certain syntality traits, such as aggressive- 
ness against groups, isolationism, or reliability 
in commitments, might well be strong enough to 
influence such students. The normative influence 
investigated by Asch, Gorden, Homans, Shonbar, 
and Sherif (1, 4,6,8,9), might also contribute to 
the unconscious adoption of group norms, with- 
out the individual’s awareness that they were be- 
ing influenced. 

A group of students particularly well situated 
for study of this question were those doctoral stu- 
dents in one department of a large graduate 
school who were in full-time residence and hold- 
ers of departmental assistanceships. These stu- 
dents were thrown together both in and out of 
class, during much of the day and evening, where- 
as most of the doctoral students in this depart- 
ment held full-time outside jobs and were on 
campus only while attending class in the evening. 

If several of the students who were deypart- 
mental assistants were arbitrarily designated as 
a ‘‘Peer Group’’, and asked to rate themselves, 
each other, and all doctoral students in the de- 
partment (Others), would they tend to super im- 
pose the image of their ‘‘Peers’’ onto ‘‘Others’’ 
or would they react as though there actually were 
a differentiated ‘‘Peer Group’’? Does the ten- 
dency of individuals to adopt group norms influ- 
ence such individuals to rate themselves more 
like their ‘‘Peers’’ or like ‘‘Others’’, or will 
there be no difference? 

This paper reports an experiment carried out 
for the purpose of answering these questions. 
Specifically, two hypotheses were examined: 


1. Self ratings will correlate as significantly 
with ratings of most doctoral students 
(Others) as with ratings of ‘‘Peers’’. 

2. Peer ratings will correlate as signifi- 


cantly with ratings of most doctoral stu- 
dents (Others) as with ratings of Self. 


Six scale scores from the California Psyc ho- 
logical Inventory were used to secure ratings for 
this experiment. They seemed to be the most per- 
tinent to this use, ov the eighteen scales provided 
in this Inventory. Five of the six happen tobe the 
most reliable and discriminating scales of the 
eighteen, based on the data provided in the test 
manual. Communality, the sixth scale used, was 
chosen on the basis of the test authors’ claim 
that it identifies random answers, a potential 
problem in this experiment. 

The six scales are identified in the CPI test 
manual as follows: 


Dominance: Assess those factors of leadership 
ability, dominance, persistence and social ini- 
tiative expressed by such words as: aggressive, 
confident, persistent, planful; as being persua- 
sive and verbally fluent; as self-reliant and in- 
dependent; and as having leadership potential 
and initiative. 


Capacity for status: An index of an individual’s 
capacity for status (not his actual or achieved 
status). The personal qualities and attributes 
which underlie and lead to status, such as: be- 
ing ascendent and selfseeking; effective incom- 
munication; ambitious, active, forceful, in- 
sightful, resourceful, and versatile, having 
personal scope and breadth of interests. 


Socialization: The degree of social maturity, in- 
tegrity, and rectitude which the individual has 
attained, for instance: serious, honest, modest, 
sincere, conscientious and responsible; se] f- 
denying and conforming, and industrious. 


Conformity: The degree to which an individual’s 
reactions and responses are dependable, mod- 
erate, tactful, reliable, sincere, patient, 
steady, and realistic; having common sense 
and good judgment. 


144 


Intellectual Efficiency: To what degree is the in- 
dividual efficient, clear-thinking, capable, in- 
telligent, progressive, thorough, resourceful; 
alert and well informed; placing a high value 
on cognative and intellectual matters? 


Flexibility: How adaptable is the person’s think- 
ing and social behavior? Is he insightful, in- 
formal, adventurous, confident, humorous, 
rebellious, idealistic, assertive, and egotis- 
tic; sarcastic and cynical; highly concerned 
with personal pleasure and diversion? 


Thirteen subjects were initially selectedfrom 
among the departmental assistants for inclusion 
in the Peer group. One of these subjects was 
called out of town before having had an opportun- 
ity to fill out the rating scales, and was there- 
fore eliminated from the sample. The data there- 
fore represents ratings for eleven male and one 
female doctoral students. Noapparent problems 
were introduced through the inclusion of the fe- 
male subject. All of the subjects had had previ- 
ous experience in the field of education, were 
full time doctoral students in the department of 
administration, and were departmental assist- 
ants. 


Methodology 


Subjects were individually asked to help test 
a theory about the ways in which people respond 
to personality tests. Each was asked to answer 
approximately 230 of the 480 questions in the CPI 
test booklet (checked with a dash), as he would 
expect most doctoral students in the department 
would answer, assuming complete honesty. The 
answer sheet was identified only by number and 
the notation ‘‘most people’’. 

When the subject finished scoring the ques- 
tions for ‘‘Others’’, he was given a new scoring 
sheet and asked to answer the questions again, 
for himself. This score sheet was also num; 
bered and marked ‘‘Self’’. Whenthe subject had 
completed this portion of the task, he was hand- 
ed an instruction sheet and rating scale. The in- 
struction sheet indicated how the rating scale 
should be filled in, and provided the exact de- 
scriptions of the six characteristics shown above, 
together with the range of ratings for each char- 
acteristic, as provided in the CPI test manual. 
He was asked to rate each of the individuals in 
the ‘‘Peer’’ group. 

In this manner, each individual was rated by 
his peers in terms of the CPI definitions provid- 
ed for each of the six scales on whichhe had rat- 
ed himself and ‘‘Others’’, and within the same 
unit range of possible ratings. Each subject was 
verbally admonished to familiarize himself with 
the definitions and be sure he understood them 
before he rated his peers. He was urged to rate 


JOURNAL OF EXPERIMENTAL EDUCATION 


all subjects on one characteristic before moving 
on to the next characteristic. 

All subjects cooperated willingly in the exper- 
iment although it took from 90 minutes to 180 min- 
utes to complete the task. Each subject was 
asked not to communicate with the others in the 
‘Peer group’’, regarding the experiment, andso 
far as is known, this request was honored. 


Results 


Most subjects consistently rated Self higher 
than ‘‘Others’’ and in no instance did any subject 
consistently rate ‘‘Others’”’ higher than Self. 
Peer rating and Self rating means were higher 
than the CPI mean for college students, but the 
‘‘Others’’ rating mean was lower than the CPI 
college student mean. 

Raw scores were converted into standard 
scores, using the tables in the CPI manual. The 
ranges, means, and sums of the squares for each 
characteristic are indicated in Tables I, II andIII. 

Small sample statistics and the null hypothesis, 
using the Fisher t formula for testing the differ- 
ence between uncorrelated means when the sam- 
ples are of equal size, seemed appropriate for 
this experiment. As reported by Guilford (5), 
this formula is: 


where M, and M; are the means of the two 
samples; =x?, and Dx?, are the sums of 
the squares in the two samples; and Nj is 
the size of either sample. 


Since Nj = 12, there are eleven degrees free- 
dom, thus a confidence level of 2.201 (5%) and 
3.106 (1%). Applying the formula, where S= Self 
rating, O = Others rating, and P = Peer rating, 
the t scores are shown in Table IV. 

These t scores indicate that the null hypothe- 
sis can be rejected with a great deal of confi- 
dence for half of these measures. This state- 
ment applies to Socialization, Communality, and 
Flexibility, when comparing Self rating and Peer 
rating; to Dominance, Capacity for Status, Social- 
ization, Intellectual Efficiency, and Flexibility, 
when comparing Peer rating with ‘‘Others’’ rat- 
ing; and to Intellectual Efficiency when compar- 
ing Self rating to ‘‘Others’’ rating. A rather low 
correlation is also indicated for four of the re- 
maining t scores. 


Tentative Conclusions 


Our hypotheses that 1) Self rating would cor- 
relate as significantly with ratings of most doc- 


TABLE I 


STANDARD SCORES FOR TWELVE DOCTORAL STUDENTS SELF RATED ON THE 
CHARACTERISTICS: DOMINANCE, CAPACITY FOR STATUS, SOCIALIZA- 
TION, CONFORMITY, INTELLECTUAL EFFICIENCY, AND FLEX- 
IBILITY; AS DEFINED IN THE CALIFORNIA PSYCHOLOGI- 
CAL INVENTORY 


66 - 2 
64-1 
62-1 
54-1 
42-1 
39-1 
33 - 1 


M = 60 
Df(x')? = 2366 


RAMEY 145 
Do Cs So Cm le Fx 
18-1 73 - 2 59 - 1 63 - 1 67-1 13-4 
16 -1 67-1 58 - 3 58 - 3 64 - 2 64-4 
10-1 65 - 3 54-1 54-4 62-1 61 - 2 
68 - 1 62-2 52-1 49 - 1 60 - 3 59 - 1 
60 - 1 51-1 45-1 49 - 3 56 - 1 
57 - 1 49-1 40 - 2 47-1 53 - 1 
46 - 1 47-1 36 - 1 44-1 
39 - 1 43 -1 41-1 
40-1 
35-1 
| P| M = 61 M = 50 M = 52 M = 56 M = 59 
1100 678 145 961 898 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE II 


STANDARD SCORES FOR ‘‘OTHERS’’ AS RATED BY TWELVE DOCTORAL STUDENTS 
ON THE CHARACTERISTICS: DOMINANCE, CAPACITY FOR STATUS, SOCI- 
ALIZATION, CONFORMITY, INTELLECTUAL EFFICIENCY, AND 
FLEXIBILITY; AS DEFINED IN THE CALIFORNIA P SYCHO- 
LOGICAL INVENTORY 


Dt(x')? 


146 
Do Cs So Cm Te Fx 
70-1 62-1 63 -1 58 - 4 58 - 1 13-1 
62 - 1 60 - 1 56 - 1 54-3 56-1 61 - 1 
48-1 57-1 51-1 49 - 2 54-1 59 - 2 
46 - 2 49-1 47-1 40 - 2 43-1 56-1 
42-1 44-2 45-2 26-1 34-1 53 - 2 
35 - 2 41-1 42-1 30-1 50-1 
31-1 39 - 2 38 - 1 28 -1 41-2 
29 - 2 +1 36-1 26-1 39 - 2 
19-1 -1 35 - 2 23-3 
23-1 29-1 17-1 
13-1 
M = 41 M =44 M = 44 M = 50 M = 34 M = 53 
BE = 2326 1564 1056 882 2629 1018 


TABLE I 


STANDARD SCORES FOR ‘‘PEER GROUP”’ AS RATED BY TWELVE DOC TORAL STU- 
DENTS ON THE CHARACTERISTICS: DOMINANCE, CAPACITY FOR STATUS, 
SOCIALIZATION, CONFORMITY, INTELLECTUAL EFFICIENCY, AND 
FLEXIBILITY; AS DEFINED IN THE CALIFORNIA PSYC HO- 

LOGIC AL INVENTORY 


Do 


80-1 
66 - 2 
62 - 3 
60 - 3 
58 - 1 
52-1 
44-1 


M = 61 
Li(x')? = 796 


RAMEY 147 
| Cs So Cm Ie Fx 
/ 80 - 1 72-2 58 - 1 69 - 1 79 - 2 
70-2 66 - 1 49 - 2 64-1 16 - 2 
67-5 65 - 3 45-4 62 - 3 73 - 3 
65-1 63 - 3 40 - 3 60 - 3 10-4 
62 - 1 59 - 1 35-1 56 - 2 64-1 
| 60 - 1 58 - 2 31-1 54-1 
| 54-1 43 -1 
i P| M = 66 M = 64 M = 44 M = 59 M = 73 . 
430 235 548 454 207 
| 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE IV 


t-SCORES FOR SELF-OTHERS, SELF-PEERS, AND OTHERS-PEERS, FOR THE CHARACTER- 
ISTICS: DOMINANCE, CAPACITY FOR STATUS, SOCIALIZATION, CON FORMITY, 
INTELLECTUAL EFFICIENCY, AND FLEXIBILITY 


So 


TABLE V 


RANK ORDER CORRELATIONS FOR SELF-PEERS, OTHER-PEERS, 
AND SELF-OTHERS, FOR THE CHARACTERISTICS: DOMIN- 
ANCE, CAPACITY FOR STATUS, SOCIALIZATION, 
COMMUNALITY, INTELLECTUAL EFFICIENCY, 
AND FLEXIBILITY 


Dominance 
Capacity for Status 
Socialization 


Communality 


Intellectual Efficiency 


Flexibility 


148 
Do Cs | Cm Ie Fx 
s-P 2044 1.494 5.384 2.555 9168 4.827 
: O-P 4.132 5. 677 6.389 1.829 5.172 6.557 
3187 .3571 1.657 . 5681 4.222 1.570 
SP OP sO 
.30 .36 
-.16 -.21 -.09 
-.16 . 23 . 58 
30 
. 29 -.05 .14 


RAMEY 


toral students (Others) as with ratings of ‘‘Peers’’, 
and 2) Peer ratings would correlate as sig nifi- 
cantly with ratings of most doctoral students 
(Others) as with ratings of Self; produced unex- 
pected results. First, for this sample, Self rat- 
ings correlated more significantly with ratings 
of most doctoral students (Others) than with rat- 
ings of ‘‘Peers’’. Also, Peer ratings correlated 
more significantly with Self ratings than with rat- 
ings of ‘‘Others’’. (See Table IV.) 

While it was to be expected that peers would 
rate each other rather closely to self rating, an 
even closer matching between self rating and the 
rating of ‘‘Others’’ was quite unexpected. Re-ex- 
amination of the procedure used and the ratings 
obtained brought to light one disturbing factor 
that might account for such results. Therefore, 
this factor was checked to see how much it had 
influenced the findings. 

Individuals, in their rating of Peers on the 
scale provided, were found to have bunched 
some of their ratings, in a few instances, rat- 
ing as many as four or five peers at the same 
point on the scale. In order to overcome this 
tendency, each subject was again contacted, and 
asked to rank-order the peers on the scale. Rat- 
ings of self and ratings of ‘‘Others’’ were then 
rank-ordered and rank order correlations were 
computed for each of the six characteristics, us- 
ing the Spearman (7) formula: 


6=D* 


P=l1- 


The Spearman rank order correlations obtained 
are shown in Table V. 

Again Self rating and Peer ratings correlate 
much more significantly than Peer rating and rat- 
ing of ‘‘Others’’, as expected. Wealsonote that 
Self rating and Peer ratings correlate more sig- 
nificantly than Self rating and ‘‘Others’’ rating, 
and this seems more likely than the result ob- 
tained with unranked ratings. It is also interest- 
ing that although none of the correlations are 
very high in the OP and SO columns in Table V, 
they seem to bear a marked resemblence to each 
other. 


Conclusions 


We can conclude, therefore, for this sample, 
and with this instrument, that: 


1. Self rating wil! correlate more significant- 
ly with rating of ‘‘Peers’’ than with rating 
of most doctoral! students (Others). 

2. Peer ratings will correlate more signifi- 
cantly with Self ratings than with ratings 
of most doctoral students (Others). 

3. Ratings of most doctoral students (Others) 
will correlate as significantly with ratings 


of Self as with ratings of ‘‘Peers’’. 


These results certainly do not preclude the 
possibility that with different subjects and/or dif- 
ferent instruments, different results might be ob- 
tained. For this particular group, however, 
there would seem to exist at least the rough 
framework of a peer group, distinguishable 
from the universe of doctoral students in the de- 
partment. This tendency to identify sucha group 
might well be entirely due to halo effect caused 
by singling out these particular students for peer 
ranking and rating. The effect of this tendency 
could be checked by repeating the experiment 
with a different group of subjects selected com- 
pletely at random within the department. 


REFERENCES 


Asch, S. E. ‘‘Effects of Group Pressure Up- 
on the Modification and Distortion of Judg- 


* ments, ’’ in Group pynamics (Evanston, Ill.: 
Row, Peterson and Co., 1953), pp. 151-62. 

Cattell, Raymond B. ‘‘Concepts and Methods 
in the Measurement of Group Syntality,’ in 


Small Groups (New York: Alfred A. Knopf, 
» pp. 107-26. 


Cottrell, Leonard S., Jr. ‘‘The Analysis of 
Situational Fields in Social Psychology,”’ in 
Small Groups (New York: Alfred A. Knopf, 

» pp. 5/-70. 


Gordon, R. L. ‘‘Interaction Between Atti- 
tude and the Definition of the Situation in 


the Expression of Opinion,’’ in G roe Dy- 
namics (Evanston, Ill.: Row, Peterson an 
Co., 1953), pp. 163-76. 

Guilford, J. P. Fundamental Statistics in Psy- 
chology and Education, second edition 
(New York: McGraw-Hill, 1950), p. 228. 

Homans, George C. The Human Group (New 


York: Harcourt, Brace and C.o., 1958), 
484 pp. 


Nelson, M. J., Denny, E. C., and Coladarci, 
A. P. (Eds.) Statistics for Teachers (New 
York: The -Dryden Press, 1956), pp. 83-7. 


Schonbar, R. A. ‘‘The Interaction of Observer- 
Pairs in Judging Visual Extent and Move- 
ment,’’ Archives of Psychology, No. 299 
(1945). 


Sherif, M. ‘‘A Study of Some Social Factors in 
Perception,’’ Archives of Psychology, No. 


187 (1935). 


| | 149 | 


Specifications for Manuscripts 


JOURNAL OF EDUCATIONAL RESEARCH 
and the... 


JOURNAL OF EXPERIMENTAL EDUCATION 


1. All manuscripts must be typewritten, double spaced, and on one side 
of the sheet only. Mimeographed and ditto sheets are acceptable only 
when very clearly printed. 


2. All unusual symbols or formulae must be very clearly typed or hand 
printed in black ink. To avoid costly printers’ composition charges it 
may be necessary for us to make cuts of difficult matter, or to print 
your material by the photo-offset lithography method. The latter means 
photographing your actual copy. It is expensive to have material re- 
drawn by our own artists, and retracing or duplicating increases the 
hazards of error. See that your copy is correct and complete as you 


wish to have it reproduced. The men who work on your manuscripts 
are not trained to understand the working symbols and language of 
your technical field. 


8. The same restrictions and requirements as in Paragraph 2 apply to all 
drawings, graphs or other illustrated materials,—they must be neatly 
done, in black ink, on bond paper or tracing cloth suitable for repro- 
duction. Remember our magazines are printed in black ink only. Color 
graphs should be changed by the author to provide different kinds of 
shading for the different areas. For example: diagonal lines for red, 
vertical lines for blue, etc. Provide a key. 


4. All tables, graphs, etc., on sheets by themselves must be properly labeled 
and identified in relation to the written copy of the manuscript. 


5. Footnotes must be complete as to author, title, place of publication, 
publisher, date and pages. They must be numbered consecutively 
throughout the article. 


6. Bibliographical notes must be complete and arranged alphabetically. 


The cooperation of all prospective authors in following these rules is 
earnestly required. It is difficult to produce technical journals accurately, 
neatly, and on time under the best conditions. Promptness in printing, 
economy, and accuracy will be promoted by carefully prepared manuscripts. 


